Draft: Refactor get_ranked_posts.sql to add a CTE to ensure posts get filtered by...
Refactor get_ranked_posts.sql to add a CTE to ensure posts get filtered by community_id first (fix big slowdown).
Merge request reports
Activity
mentioned in issue #260 (closed)
added 1 commit
- 8d5b2ac9 - Add faster observer check for muted accounts and refactor CTE in get_account_posts_by_replies
added 4 commits
-
8d5b2ac9...55f55884 - 2 commits from branch
develop
- 4340b63c - Refactor get_ranked_posts.sql to add a CTE to ensure posts get filtered by...
- c6530c18 - Add faster observer check for muted accounts and refactor CTE in get_account_posts_by_replies
-
8d5b2ac9...55f55884 - 2 commits from branch
added 1 commit
- 8b72422c - Refactor get_trending_ranked_posts_for_communities to use MATERIALIZED CTE for...
added 5 commits
- 0bb99780 - Refactor get_ranked_posts.sql to add a CTE to ensure posts get filtered by...
- e420d4d0 - Add faster observer check for muted accounts and refactor CTE in get_account_posts_by_replies
- 1b0edcb7 - Retag docker images for regular builds as /hive/hivemind, removing
- 9c4446a6 - Postgrest - restore LIMIT in queres which uses CTE and limit, appears that...
- 25fbbf1f - Refactor get_trending_ranked_posts_for_communities to use MATERIALIZED CTE for...
Toggle commit listadded 7 commits
-
25fbbf1f...e02e4640 - 3 commits from branch
develop
- 3753f7dd - Refactor get_ranked_posts.sql to add a CTE to ensure posts get filtered by...
- 98ef8474 - Add faster observer check for muted accounts and refactor CTE in get_account_posts_by_replies
- 4732eacf - Refactor get_trending_ranked_posts_for_communities to use MATERIALIZED CTE for...
- 30015e74 - Merge branch 'speedup-ranked-posts' of gitlab.syncad.com:hive/hivemind into speedup-ranked-posts
Toggle commit list-
25fbbf1f...e02e4640 - 3 commits from branch
added 1 commit
- 36c5854e - Fix formatting in get_account_posts.sql by removing merge conflict markers
added 1 commit
- 462c63e0 - Revert check related to muting, it is the replies being muted, not the posts.
This index speeds up get_account_posts_by_replies:
DROP INDEX IF EXISTS hivemind_app.hive_posts_parent_id_id_idx; CREATE INDEX IF NOT EXISTS hive_posts_parent_id_id_idx ON hivemind_app.hive_posts USING btree (parent_id ASC NULLS LAST, id DESC NULLS FIRST) INCLUDE (author_id) TABLESPACE haf_tablespace WHERE counter_deleted = 0;
The trick is it avoids having to read the hive_posts table to fetch the author_ids. There can be a lot of posts by some accounts (e.g. rentmoney had 30K+ posts).
added 1 commit
- ca020dc0 - Fix error in get_trending_ranked_posts_for_communities
One remaining known slow one is get_created_ranked_posts_for_observer_communities when the observer is not subscribed to any recent communities (or worse, not subscribed to any at all). We have an idea of how to fix it, but the first attempt didn't work, although it still seems like it should, so it needs more investigation as to why it is still slow (in the fix attempt it seems like it is still reading hive_posts when it should just use data from an index only scan). Here was fix attempt (not necessarily the last version but suggests the idea):
ROP FUNCTION IF EXISTS hivemind_postgrest_utilities.get_created_ranked_posts_for_observer_communities; CREATE FUNCTION hivemind_postgrest_utilities.get_created_ranked_posts_for_observer_communities(IN _post_id INT, IN _observer_id INT, IN _limit INT) RETURNS JSONB LANGUAGE 'plpgsql' STABLE AS $$ DECLARE _result JSONB; BEGIN _result = ( SELECT jsonb_agg ( hivemind_postgrest_utilities.create_bridge_post_object(row, 0, NULL, row.is_pinned, True) ) FROM ( -- get_created_ranked_posts_for_observer_communities WITH observer_subscriptions AS MATERIALIZED ( SELECT community_id FROM hivemind_app.hive_subscriptions WHERE account_id = _observer_id ), max_post_id as MATERIALIZED ( --find the max post id for each subscribed community SELECT COALESCE(MAX(hp.id), 0) as max_id FROM hivemind_app.live_posts_view hp JOIN observer_subscriptions os ON hp.community_id = os.community_id where hp.id < _post_id OR _post_id = 0 ), observer_posts as ( SELECT hp.id, blacklist.source FROM hivemind_app.live_posts_view hp JOIN hivemind_app.hive_accounts_view ha ON hp.author_id = ha.id JOIN observer_subscriptions hs ON hs.community_id = hp.community_id LEFT OUTER JOIN hivemind_app.blacklisted_by_observer_view blacklist ON (blacklist.observer_id = _observer_id AND blacklist.blacklisted_id = hp.author_id) WHERE NOT ha.is_grayed AND hp.id <= (SELECT max_id from max_post_id) AND NOT (_observer_id <> 0 AND EXISTS (SELECT 1 FROM hivemind_app.muted_accounts_by_id_view WHERE observer_id = _observer_id AND muted_id = hp.author_id)) ORDER BY hp.id DESC LIMIT _limit ) SELECT hp.id, hp.author, hp.parent_author, hp.author_rep, hp.root_title, hp.beneficiaries, hp.max_accepted_payout, hp.percent_hbd, hp.url, hp.permlink, hp.parent_permlink_or_category, hp.title, hp.body, hp.category, hp.depth, hp.promoted, hp.payout, hp.pending_payout, hp.payout_at, hp.is_paidout, hp.children, hp.votes, hp.created_at, hp.updated_at, hp.rshares, hp.abs_rshares, hp.json, hp.is_hidden, hp.is_grayed, hp.total_votes, hp.sc_trend, hp.role_title, hp.community_title, hp.role_id, hp.is_pinned, hp.curator_payout_value, hp.is_muted, observer_posts.source AS blacklists, hp.muted_reasons FROM observer_posts, LATERAL hivemind_app.get_post_view_by_id(observer_posts.id) hp ORDER BY hp.id DESC LIMIT _limit ) row ); RETURN COALESCE(_result, '[]'::jsonb); END $$ ;
Edited by Dan Notestein
I wanna ask about using
MATERIALIZED
. I learnt that I shouldn't use it. I cannot paste you any example because it was some time ago, but when I had a case which I examined I found out that MATERIALIZED slightly increases execution time. In other cases which I checked this didn't have influence on execution time, so I decided to removeMATERIALIZED
. I see you put them back, but now I also try to analize execution time with and withoutMATERIALIZED
and I still cannot see significant differs in exectution time. Can you explain my why you use that?Edited by Andrzej SuchSure, you use MATERIALIZED if you want to enforce your CTE (it will not bring stuff from outside the CTE in). So when you use materialize, you are saying you are smarter than the planner about the order to do things. That's frequently a dangerous thing to do, especially as things can change over time, unless you've checked every possibility and really thought it through.
@dan I guess we should merge some improvements into develop. Will you merge this MR into develop or should I merge mine ( !783 (merged) ) after applying your changes into my MR?
BTW you have an error in pipeline and I think you wanted to do something like that in
get_trending_ranked_posts_for_communities
:community_posts as MATERIALIZED ( SELECT id, author_id, sc_trend, is_pinned FROM hivemind_app.live_posts_view WHERE community_id = (SELECT id FROM hivemind_app.hive_communities WHERE name = _tag) AND NOT is_paidout --use index hive_posts_community_id_not_is_paidout_idx ),
!783 (merged) has been merged. I checked final version of every query in this MR and if I saw better efficiency I put it into my MR. I pasted examples of queries there.
As a side note, we should check all the hive_post indexes for possible include of author_id (which is always needed for mute/blacklist checking) whenever that would allow a top sub query to do an "index ONLY scan".
Edited by Dan Notesteinmentioned in merge request !783 (merged)