hive_post_data table could be more efficient (minor issue for now, but creating to track this)
Out of curiosity, I took a look at where most of the toast in a fully populated HAF database comes from. Here's what I found:
SELECT oid::regclass,
reltoastrelid::regclass,
pg_relation_size(reltoastrelid) AS toast_size
FROM pg_class
WHERE relkind = 'r'
AND reltoastrelid <> 0
ORDER BY 3 DESC;
hivemind_app.hive_post_data | pg_toast.pg_toast_508315 | 42563878912
hafd.operations | pg_toast.pg_toast_16661 | 6388858880
hafd.hafbe_app_metadata | pg_toast.pg_toast_508701 | 32555008
pg_statistic | pg_toast.pg_toast_2619 | 4857856
hivemind_app.hive_accounts | pg_toast.pg_toast_508297 | 2408448
pg_rewrite | pg_toast.pg_toast_2618 | 1687552
pg_proc | pg_toast.pg_toast_1255 | 466944
hivemind_app.hive_notifs | pg_toast.pg_toast_508451 | 245760
hafd.operations_reversible | pg_toast.pg_toast_16770 | 81920
hivemind_app.hive_communities | pg_toast.pg_toast_508401 | 16384
hafd.shadow_hafd_hafbe_app_metadata | pg_toast.pg_toast_508712 | 8192
Since hive_post_data is derived almost entirely from hafd.operations I was curious why it has so much more toast than hafd.operations.
haf_block_log=# \dS+ hivemind_app.hive_post_data
Table "hivemind_app.hive_post_data"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------+-------------------------+-----------+----------+-------------------------------------------------------------------+----------+-------------+--------------+-------------
hive_rowid | bigint | | not null | nextval('hivemind_app.seq_hivemind_app_hive_post_data'::regclass) | plain | | |
id | integer | | not null | | plain | | |
title | character varying(512) | | not null | ''::character varying | extended | | |
preview | character varying(1024) | | not null | ''::character varying | extended | | |
img_url | character varying(1024) | | not null | ''::character varying | extended | | |
body | text | | not null | ''::text | extended | | |
json | text | | not null | ''::text | extended | | |
Indexes:
"hive_post_data_pkey" PRIMARY KEY, btree (id)
Inherits: hivemind_app.hivemind_app
Access method: heap
Options: fillfactor=70
I suspect the main reason is the "doubling" that takes place because of the preview and the body, but in any case, this doubling of data seems excessive and a better structure should be employed. Note this is also the biggest table in hivemind (165GB at the moment).