proposition for new HAF application main loop

Now, we have a few real HAF applications (Hivemind, HAF Block Explorer, Balance Tracker), and we've received feedback from them regarding the complexity of the HAF apps API and the application's main loop. Here's my personal, subjective list of issues:

Hardly anyone reads the HAF documentation, and even when they do, they quickly forget about crucial details (such as the fact that the first block in the returned block range is the current block).
Complicated problem with attach and detach contexts: when to do this, how to start/restart the application, how to save some data when the application is detached, auto detach, current block when a context is detached
Finding the right place in the app's code for the main loop to issue a commit is challenging.
The function app_next_block alters the internal state of contexts, and applications often overlook this fact in their loops, leading to incorrect synchronization
Braking synchronization of an app. is challenging

Due to the issues above, the main loops of the application incorporate custom, complex logic, resulting in overly lengthy code.

The application's main loop must be straightforward and doesn't require developers to understand HAF details deeply. My postulates are:

hide attach/detach context from apps. developers
change hive.app_next_block to a procedure to enable it to issue a COMMIT.
app_next_block becomes one and only one method that delivers a range of blocks to the application, no more separate iteration blocks by detached apps
only app_next_block manipulates the current block of contexts

Based on experience with the already implemented applications, it looks like each application divides the synchronization process into stages (i.e hivemind process blocks massively with disabled indexes, then massively with enabled indexes, and then in live mode, when reversible blocks are processed block by block). IMO all stateful applications (those that have registered tables) use stages, so I suggest expressing these stages explicitly by applications and associating them with contexts. Each application will deliver description of stages, it may be an ARRAY os stages, whereas stage contains name, minimum distance to head block when a stage is enabled, and maximum number of blocks that can be processed in one turn, here is a pseudocode example: [ (NO_INDEXES, 1000000, 1000), (WITH_INDEXES, 100, 1000), (LIVE, 1, 1 ) ]

hive.app_next_block will deliver the stage name and range of blocks to process in this stage. Detaching and attaching context will be executed by hive.app_next_block, the same as current_block_modification. The next calling hive.app_next_block will issue COMMIT, to save the previous iteration to the database. If an app wants to break the loop, it must exit it immediately after calling hive.app_next_block.

Here is an example of an application with a main loop:

create_context( 'hivemind, ARRAY[ (NO_INDEXES, 1000000, 1000), (WITH_INDEXES, 100, 1000), ('LIVE', 1, 1 ) );


while True:
	range = hive.app_next_block( 'hivemind' )
	
	-- if sync must end then simply break the loop after hive.app_next_block, do not commit
	if break_request:
		break;
	
	if range IS NULL:
		continue;
	 
	switch (range.stage)
		case 'NO_INDEXES':
			disable_indexes();
			process_blocks_massivly( range.first, range.last )	
		case 'WITH_INDEXES':
			enable_indexes();
			process_blocks_massivly( range.first, range.last )
		case 'LIVE':
			enable_indexes();
			process_one_block( range.first )
		default:
			ASSERT( FALSE, 'Unknown stage' )

Here is a draft of a new hive.app_next_block algorithm:

Regarding group context, I think the lead context stages should be used

Edited Mar 26, 2024 by Marcin