Deadlock on pqxx::unique_violation error while doing replay
I encountered a deadlock in HAF when sql error occurred while doing replay.
I can reproduce it by doing successful replay which is followed by another replay. The second one will cause pqxx::unique_violation
on pk_hive_blocks
index.
Here's hived log just before deadlock (full log in attachment):
2024-06-05T08:39:16.969178 indexes_controler.cpp:137 operator() ] Attempting to execute query: `SELECT hive.restore_indexes( 'hive.transactions' );`...
2024-06-05T08:39:16.973810 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.transactions_multisig' );' data processor errored...
2024-06-05T08:39:16.973881 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.transactions_multisig' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.transactions_multisig' );'.
2024-06-05T08:39:16.973914 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:16.974158 transaction_controllers.cpp:128 do_reconnect ] Connected to database: `dbname=haf_block_log user=hived password=hived hostaddr=127.0.0.1 port=5432'.
2024-06-05T08:39:16.974233 transaction_controllers.cpp:133 do_reconnect ] SET synchronous_commit = OFF;
2024-06-05T08:39:16.975107 indexes_controler.cpp:137 operator() ] Attempting to execute query: `SELECT hive.restore_indexes( 'hive.account_operations' );`...
2024-06-05T08:39:16.976118 transaction_controllers.cpp:128 do_reconnect ] Connected to database: `dbname=haf_block_log user=hived password=hived hostaddr=127.0.0.1 port=5432'.
2024-06-05T08:39:16.976161 transaction_controllers.cpp:133 do_reconnect ] SET synchronous_commit = OFF;
2024-06-05T08:39:16.976433 indexes_controler.cpp:137 operator() ] Attempting to execute query: `SELECT hive.restore_indexes( 'hive.applied_hardforks' );`...
2024-06-05T08:39:16.979401 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.applied_hardforks' );' data processor errored...
2024-06-05T08:39:16.979461 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.applied_hardforks' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.applied_hardforks' );'.
2024-06-05T08:39:16.979486 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:17.017713 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.accounts' );' data processor errored...
2024-06-05T08:39:17.017773 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.accounts' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.accounts' );'.
2024-06-05T08:39:17.017793 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:17.193622 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.transactions' );' data processor errored...
2024-06-05T08:39:17.193722 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.transactions' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.transactions' );'.
2024-06-05T08:39:17.193763 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:19.453090 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.blocks' );' data processor errored...
2024-06-05T08:39:19.453173 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.blocks' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.blocks' );'.
2024-06-05T08:39:19.453197 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:19.453482 data_processor.cpp:215 join ] Caught unhandled exception Dynamic exception type: pqxx::unique_violation
std::exception::what: ERROR: could not create unique index "pk_hive_blocks"
DETAIL: Key (num)=(28595) is duplicated.
CONTEXT: parallel worker
SQL statement "ALTER TABLE hive.blocks ADD CONSTRAINT pk_hive_blocks PRIMARY KEY (num)"
PL/pgSQL function hive.restore_indexes(text) line 16 at EXECUTE
2024-06-05T08:39:19.455079 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:19.455798 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.applied_hardforks' );'
2024-06-05T08:39:19.456973 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:19.457323 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.account_operations' );'
2024-06-05T08:39:23.215773 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.operations' );' data processor errored...
2024-06-05T08:39:23.215830 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.operations' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.operations' );'.
2024-06-05T08:39:23.215851 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:33.207377 data_processor.cpp:116 operator() ] Query processor: `SELECT hive.restore_indexes( 'hive.account_operations' );' data processor errored...
2024-06-05T08:39:33.207429 data_processor.cpp:233 handle_exception ] Data processor Query processor: `SELECT hive.restore_indexes( 'hive.account_operations' );' detected SQL statement execution failure. Failing statement: `SELECT hive.restore_indexes( 'hive.account_operations' );'.
2024-06-05T08:39:33.207447 data_processor.cpp:16 kill_node ] An error occured and HAF is stopping synchronization...
2024-06-05T08:39:33.207614 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:33.207647 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.accounts' );'
2024-06-05T08:39:33.207719 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:33.207738 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.operations' );'
2024-06-05T08:39:33.207816 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:33.207836 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.transactions_multisig' );'
2024-06-05T08:39:33.207957 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:33.207972 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.transactions' );'
2024-06-05T08:39:33.208089 queries_commit_data_processor.cpp:19 ~queries_commit_data ] ~queries_commit_data_processor
2024-06-05T08:39:33.208111 data_processor.cpp:130 ~data_processor ] ~data_processor: Query processor: `SELECT hive.restore_indexes( 'hive.irreversible_data' );'
In gdb
I can see all threads are waiting:
(gdb) i thr
Id Target Id Frame
* 1 Thread 0x7fc96a3a3e00 (LWP 14729) "hived" futex_wait (private=<optimized out>, expected=12, futex_word=0x7fc94c10babc) at ../sysdeps/nptl/futex-internal.h:146
2 Thread 0x7fc96a3a2640 (LWP 14730) "logging_thread" __futex_abstimed_wait_common64 (private=1179670597, cancel=true, abstime=0x7fc96a3a1480, op=137, expected=0, futex_word=0x7fc964000bb4)
at ./nptl/futex-internal.c:57
3 Thread 0x7fc9699a0640 (LWP 14731) "hived" 0x00007fc96b1ebe2e in epoll_wait (epfd=4, events=0x7fc96999eec0, maxevents=128, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
4 Thread 0x7fc96919f640 (LWP 14732) "worker_1_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
5 Thread 0x7fc96899e640 (LWP 14733) "worker_2_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
6 Thread 0x7fc95bfff640 (LWP 14734) "worker_3_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
7 Thread 0x7fc95b7fe640 (LWP 14735) "worker_4_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
8 Thread 0x7fc95affd640 (LWP 14736) "worker_5_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
9 Thread 0x7fc95a7fc640 (LWP 14737) "worker_6_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
10 Thread 0x7fc959ffb640 (LWP 14738) "worker_7_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
11 Thread 0x7fc9597fa640 (LWP 14739) "worker_8_of_8" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302965cc) at ./nptl/futex-internal.c:57
12 Thread 0x7fc958ff9640 (LWP 14740) "" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fc934000bb0) at ./nptl/futex-internal.c:57
13 Thread 0x7fc93bfff640 (LWP 14741) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
14 Thread 0x7fc93b7fe640 (LWP 14742) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
15 Thread 0x7fc93affd640 (LWP 14743) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
16 Thread 0x7fc93a7fc640 (LWP 14744) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
17 Thread 0x7fc939ffb640 (LWP 14745) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
18 Thread 0x7fc9397fa640 (LWP 14746) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
19 Thread 0x7fc938ff9640 (LWP 14747) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
20 Thread 0x7fc91ffff640 (LWP 14748) "asio" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555630313b78) at ./nptl/futex-internal.c:57
21 Thread 0x7fc91f7fe640 (LWP 14749) "hived" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556303151ec) at ./nptl/futex-internal.c:57
22 Thread 0x7fc91effd640 (LWP 14759) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
23 Thread 0x7fc91e7fc640 (LWP 14760) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
24 Thread 0x7fc91dffb640 (LWP 14761) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
25 Thread 0x7fc91d7fa640 (LWP 14762) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
26 Thread 0x7fc91cff9640 (LWP 14763) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
27 Thread 0x7fc907fff640 (LWP 14764) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
28 Thread 0x7fc9077fe640 (LWP 14765) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
29 Thread 0x7fc906ffd640 (LWP 14766) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
30 Thread 0x7fc9067fc640 (LWP 14767) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
31 Thread 0x7fc905ffb640 (LWP 14768) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
32 Thread 0x7fc9057fa640 (LWP 14769) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
33 Thread 0x7fc904ff9640 (LWP 14770) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
34 Thread 0x7fc9047f8640 (LWP 14771) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
35 Thread 0x7fc903ff7640 (LWP 14772) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
36 Thread 0x7fc9037f6640 (LWP 14773) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
37 Thread 0x7fc902ff5640 (LWP 14774) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
38 Thread 0x7fc9027f4640 (LWP 14775) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
39 Thread 0x7fc901ff3640 (LWP 14776) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
40 Thread 0x7fc9017f2640 (LWP 14777) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
41 Thread 0x7fc900ff1640 (LWP 14778) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
42 Thread 0x7fc9007f0640 (LWP 14779) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
43 Thread 0x7fc8fffef640 (LWP 14780) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
44 Thread 0x7fc8ff7ee640 (LWP 14781) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
45 Thread 0x7fc8fefed640 (LWP 14782) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
46 Thread 0x7fc8fe7ec640 (LWP 14783) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
47 Thread 0x7fc8fdfeb640 (LWP 14784) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
48 Thread 0x7fc8fd7ea640 (LWP 14785) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
49 Thread 0x7fc8fcfe9640 (LWP 14786) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
50 Thread 0x7fc8fc7e8640 (LWP 14787) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
51 Thread 0x7fc8fbfe7640 (LWP 14788) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
52 Thread 0x7fc8fb7e6640 (LWP 14789) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
53 Thread 0x7fc8fafe5640 (LWP 14790) "api" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5556302808a8) at ./nptl/futex-internal.c:57
54 Thread 0x7fc2daffd640 (LWP 14995) "hived" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fc94c10bac4) at ./nptl/futex-internal.c:57
Here's an initial fix that mitigates the issue 483c7cf4
Full log: replay.log.gz