sql_serializer: do not break massive sync unless the head block is reached

assigned to @Ickiewicz

changed title from sql_serializer: do not brak massive sync unless the head block is reached to sql_serializer: do not break massive sync unless the head block is reached

added Doing label

Here is a quotation from the chat about the problem:

 wrona I've confirmed that psql_index-threshold causes massive sync to end before reaching headblock, which I think is a flawed design approach (you can see my suggestion for behavior in previous thread here). Do you agree or am I missing something? If there's agreement on this point, I can take a look at the code and see how easy it is to change.
6:39 PM
intention of
psql_index-threshold
should be a block limit to determine dropping indexes and constraints when sync should enter "massive" mode. Nothing more, especially it should not affect synchronization block range.
7:12 PM
Ok, so problem is it also tells code when to exit massive-sync

The massive sync ends when the hived node ends reindexing:

void sql_serializer_plugin_impl::on_post_reindex(const reindex_notification& note)
{
  ilog("finishing from post reindex");

  process_cached_data();
  wait_for_data_processing_finish();

  if(note.last_block_number >= note.max_block_number)
    switch_db_items(true/*mode*/);

  _dumper = std::make_unique< livesync_data_dumper >( db_url, main_plugin, chain_db );
}

there is no certainty if the block is irreversible or not, thus the HAF has to work with a fully enabled micro-fork handling solution. Maybe we can find some heuristic, for example, make an assumption that all blocks older than 10minuts are irreversible, but I'm not sure if it's a good idea.

We have a change planned that will dramatically limit the potential length of a fork, so that should mostly resolve this issue.

Actually, situation is better than that even: reindex/replay can only be initiated at startup. And at startup we only have irreversible blocks.

But at least before hived was ending reindex when this psql_index_threshold was hit. @Ickiewicz Did you make some code change already so that reindex continues until all blocks in block_log have been processed? If not, we need to change that.

The point of this psql_index_threshold should be very simple: at launch of hived replay, it should be used to determine whether or not to drop indexes and start a massive sync. The tradeoff is massive sync processes blocks faster, but dropping/re-adding indexes also takes time, so the threshold should be used to set the point at which it makes sense to do massive sync. But this value should only be used at launch time.

removed Doing label

One more dump from the rocket chat:

 in general 'the massive sync' ends when hived signals end of reindex (reach end of block log), then live sync start, what means every next block(taken form p2p ) is supposed to be reversible
11:02 AM
That would be fine, it's just not what I recall observing. I remember seeing it process blocks somewhat slowly, but faster than 1 per 3 second period (e.g behaving like live sync in processing speed per block, but not at head block yet).
In live sync, blocks can't arrive much faster than 1 block per 3s
So result was that it seemed to take a long time to finally reach all the way to head block.
11:05 AM
yes, it is expected behaviour, because new blocks are pushed one by to hive fork manager, they are not collected and then massively inserted to the tables
11:05 AM
Maybe there has been some change in behavior, but I suppose there are still logs on steem-7 that record what I recall observing.
Yes, I understand it would work this way for new p2p blocks. But I saw them faster than one per 3s. How does that happen from p2p?
Here is one theory: it uses head block at time of startup of hived. So maybe sync blocks are also considered as new blocks.
11:09 AM
the node has to ask peers to get blocks which are not included in the block log
11:09 AM
I suppose your argument is that we must assume all such blocks are reversible and have to be analyzed one-at-a-time? Maybe this is true, but I think we can do better.
Ok, at least I understand issue now: it is sync blocks that come in after hived start.

The solution was agreed on chat:

 ok, i think i understand now, when we reach the live sync, then we started to push to haf from the queue
in happy path we can push them massively ( because most of the blocks are irreversible), if not then reversible blocks will be pushed one by one
course we can start to push them earlier, if we know that they are irreversible..
8:41 AM
In practice, it will work like this: node starts up and will need to catch up. We will therefore start in massive sync mode assuming threshold setting tells us to. It may or may not need to replay block_log first. In any event, it will then enter p2p sync mode to catch up. We will stay in massive sync mode until p2p says it has exited p2p sync mode to live mode. During massive sync mode we push only known irrev blocks from block_log. When we drop out of p2p sync, we create indexes and at this point we just push all blocks as we get them, because we can now push reversible ones. 99% of the time, forks will have no influence in this at all. Because it would only happen if we got forks while we were still in massive sync mode (and therefore also in p2p sync mode). But reality is that almost always forks will occur when node is in live mode and in this case we are pushing blocks as we receive them.
Even if surrounding network is forking, a node that is in p2p sync mode doesn't really care much: it is still processing old blocks, it hasn't reached the state where it cares about fork blocks.

The issue was solved with: !40 (merged)

closed

sql_serializer: do not break massive sync unless the head block is reached

Designs

Child items ...

Activity