Compression run of 63,000,000 blocks using the best settings for all three compression methods (compress_block_log --jobs=28 --block-count=63000000 --deflate-level=9 --zstd-level=22 --brotli-quality=11
), compressed 536MiB to 297MiB, 44% space savings.
compressed 575437644643 to 318895551744
Total number of blocks by compression method:
uncompressed: 1395988
deflate: 3823572
brotli: 57385930
zstd: 390911
Total bytes if all blocks compressed by compression method:
zstd: 344200793983
brotli: 318950563477
deflate: 348664908683
Took a little over 13 hours using 28 worker threads on steem 8.
Note for brotli: a previous run with just brotli & uncompressed as options came out to 318931202547 bytes (34MiB larger), and that test was run with the default compression level (3, I think) instead of max 11. Conclusion: default compression level is good enough.
On steem-10 with time sudo /hafpool/hive/build/programs/util/compress_block_log --jobs=32 -i /storage2/datadir-haf/blockchain2 -o /storage2/datadir-haf/blockchain
, compressed 531G to 295G, 44.4% space savings.
compressed 569178362328 to 315680385136
Total number of blocks by compression method:
uncompressed: 1393100
deflate: 3849717
brotli: 57165118
zstd: 365622
Total bytes if all blocks compressed by compression method:
zstd: 350954189789
brotli: 315734867014
deflate: 345475220135
real 666m43.176s
user 21303m54.245s
sys 19m26.008s
Brotli size/speed tests
Tests using:
./build/programs/util/compress_block_log -i /storage1/datadir-fc-resync/blockchain -o /storage1/datadir-compression-benchmark-output -s 60000001 -n 100000 -j 28 --enable-zstd=no --enable-deflate=no --brotli-quality=11
Results:
at block 60100000: total uncompressed 2728573097
brotli (default) brotli: 1419241468 bytes, total time: 5520650355μs, average time per block: 55207μs
--brotli-quality=0 brotli: 1698643789 bytes, total time: 10868064μs, average time per block: 108μs
--brotli-quality=1 brotli: 1644899310 bytes, total time: 15552699μs, average time per block: 155μs
decompression total time: 15793653μs, average time per block: 157μs
--brotli-quality=2 brotli: 1601941431 bytes, total time: 21084596μs, average time per block: 210μs
--brotli-quality=3 brotli: 1584324465 bytes, total time: 27585704μs, average time per block: 275μs
--brotli-quality=4 brotli: 1569639008 bytes, total time: 43070136μs, average time per block: 430μs
--brotli-quality=5 brotli: 1537438923 bytes, total time: 73112745μs, average time per block: 731μs
--brotli-quality=6 brotli: 1536290592 bytes, total time: 84488704μs, average time per block: 845μs
--brotli-quality=7 brotli: 1534922886 bytes, total time: 131463678μs, average time per block: 1315μs
--brotli-quality=8 brotli: 1534897982 bytes, total time: 144222054μs, average time per block: 1442μs
--brotli-quality=9 brotli: 1522597470 bytes, total time: 7160486299μs, average time per block: 71604μs
--brotli-quality=10 brotli: 1428760442 bytes, total time: 2609322623μs, average time per block: 26093μs
--brotli-quality=11 brotli: 1419241468 bytes, total time: 5513853881μs, average time per block: 55138μs
decompression total time: 17985308μs, average time per block: 179μs
zstd size/speed tests
--zstd-level=0 zstd: 1575040084 bytes, total time: 17537878μs, average time per block: 175μs
decompression total time: 6425461μs, average time per block: 64μs
--zstd-level=22 zstd: 1523738272 bytes, total time: 791734948μs, average time per block: 7917μs
decompression total time: 6796739μs, average time per block: 67μs
deflate size/speed tests
--deflate-level=0 deflate: 2729673097 bytes, total time: 2346894μs, average time per block: 23μs
--deflate-level=1 deflate: 1624259301 bytes, total time: 40467886μs, average time per block: 404μs
decompression total time: 14384110μs, average time per block: 143μs
--deflate-level=9 deflate: 1550819167 bytes, total time: 69408282μs, average time per block: 694μs
decompression total time: 14061821μs, average time per block: 140μs
zstd tests with custom dictionary
Testing with 1M blocks (62,000,000 - 62,999,999)
Uncompressed total size: 27,373,465,451
Compressed total size, level 3, (no dictionary): 15,773,922,290 (42.4% savings)
Compressed total size, level 3, (custom dictionary): 14,681,782,146 (46.4% savings)
Compressed total size, level 22, (no dictionary): 15,229,832,829 (44.4% savings)
Compressed total size, level 22, (custom dictionary): 13,927,132,716 (49.1% savings)
dictionary universality
Compressing 1M blocks (30,000,000 - 30,999,999) using the optimal dictionary computed for blocks in the 62M range.
Uncompressed total size: 7,551,559,082
Compressed total size, level 3, (no dictionary): 5,073,676,910 (32.8% savings)
Compressed total size, level 3, (custom 110K dictionary for 62M): 4,852,694,799 (35.7% savings)
Compressed total size, level 3, (custom 1M dictionary for 62M): 4,765,501,019 (36.9% savings)
Compressed total size, level 3, (custom 10K dictionary for 30M): 4,596,330,928 (39.1% savings)
Compressed total size, level 3, (custom 20K dictionary for 30M): 4,534,512,220 (40% savings)
Compressed total size, level 3, (custom 110K dictionary for 30M): 4,372,035,983 (42.1% savings)
Compressed total size, level 3, (custom 1M dictionary for 30M): 4,244,177,005 (43.8% savings)
30M dictionaries compress decently, so actual impact on hived binary is less:
- 10K -> 5.7K
- 20K -> 10.7K
- 110K -> 67.5K
Testing with 1M blocks (62,000,000 - 62,999,999)
Uncompressed total size: 27,373,465,451
Compressed total size, level 3, (no dictionary): 15,773,922,290 (42.4% savings)
Compressed total size, level 3, (custom 24K dictionary): 14,872,229,623 (45.7% savings)
Compressed total size, level 3, (custom 55K dictionary): 14,742,732,372 (46.1% savings)
Compressed total size, level 3, (custom 75K dictionary): 14,705,713,318 (46.3% savings)
Compressed total size, level 3, (custom 110K dictionary): 14,681,782,146 (46.4% savings)
Compressed total size, level 3, (custom 220K dictionary): 14,647,581,139 (46.5% savings)
Compressed total size, level 3, (custom 1M dictionary): 14,497,049,180 (47% savings)
Dictionary compression:
- 75K -> 45.7K
- 55K -> 33.2K
- 24K -> 14.8K
- 110K -> 69.1K
- 220K -> 140.5K
- 1M -> 644.7K
Small sample, zstd vs brotli:
15,694 random blocks in the 62,900,000 to 62,999,999 range
uncompressed total size: 451,527,653
zstd, level 22, no dictionary: 250884419 (44.4% savings)
zstd, level 22, custom dictionary: 229,851,902 (49.1% savings)
brotli, quality 11, dictionary disabled: 232,747,016 (48.4% savings)
brotli, quality 11, default dictionary: 232,120,637 (48.6% savings)
brotli, quality 11, using same zstd custom dictionary: 215,893,016 (52.2% savings)
Final version
on a 5950X, using our chosen default compression level of 15 and custom dictionaries
$ compress_block_log --input-block-log=/path/to/blockchain --output-block-log=/path/to/compressed-block-log-63M --block-count=63999999 -j28
270210ms compress_block_log.cpp:255 drain_completed_queu ] Writer thread done writing compressed blocks to /storage2/compressed-block-log-63M/block_log, compressed 603915566332 to 308546620002
270210ms compress_block_log.cpp:410 main ] Total number of blocks by compression method:
270211ms compress_block_log.cpp:414 main ] uncompressed: 6461
270211ms compress_block_log.cpp:414 main ] zstd: 63961921
270211ms compress_block_log.cpp:417 main ] Total bytes if all blocks compressed by compression method:
270211ms compress_block_log.cpp:420 main ] zstd: 308546652312 bytes, total time: 74367890759μs, average time per block: 1162μs
real 50m26.867s
user 1239m43.586s
sys 27m50.635s