Change "errors" for missing data to "warning" or "info" to avoid postgres logging them

changed milestone to %JulyHivemind

This should be a simple issue, so I'd like to get it done quickly and deploy it to production to lower log noise levels.

changed title from Consider changing "errors" for missing data to "warning" or "info" to avoid logging them to Change "errors" for missing data to "warning" or "info" to avoid postgres logging them

Probably fix this similar issue at same time:

2024-03-28 18:42:27 UTC [595]: app=hive_server STATEMENT:  SELECT * FROM hivemind_app.condenser_get_content('demo.com', 'hytyopzgem')
2024-03-28 18:42:27 UTC [522]: app=hive_server ERROR:  Post deepu7/travel-to-nobabgonj-or-or-meeting-up-with-my-school-friends-after-two-decades was deleted 2 time(s)
2024-03-28 18:42:27 UTC [522]: app=hive_server CONTEXT:  PL/pgSQL function hivemind_app.find_comment_id(character varying,character varying,boolean) line 24 at RAISE
        PL/pgSQL function hivemind_app.find_votes(character varying,character varying,integer) line 4 at assignment

dan
12:04 PM
For logging, I was referring to errors thrown when someone requests a comment that hasn't been seen yet by haf
From the logs, I think these are thrown as ERROR level
But considering as error level for this case seems a bit excessive
Here's an example: 2024-03-28 17:40:19 UTC [2291]: app=hive_server STATEMENT: SELECT * FROM hivemind_app.condenser_get_content('steemychicken1', 'pglwzhbavs')
2024-03-28 17:40:19 UTC [593]: app=hive_server ERROR: Post travoved/ynsbhlpfgx does not exist
Logs are full of these
I mean, makes sense to raise an exception, but probably can be done with a lower severity level?
These logs are shown in postgres.log, btw, I'm not talking about hivemind server log, which is separate issue
I want to keep logging level at error for postgres, otherwise we could miss important problems
wrona
wrona
12:19 PM
hmm the problem is that we are using RAISE EXCEPTION to break processing in such cases...
wrona
wrona
12:38 PM
this construct is used for last few years... Each our custom error has associated custom error code, but I don't if it allows to eliminate them from logs
dan
dan
1:17 PM
Yeah, I was hoping just the logging level of the RAISE could be changed, but I looked it up just now and only the "EXCEPTION" level does an actual "throw" type behavior.
I'll dig into logging options, to see if there's some way to handle it, but I'm a bit pessimistic
I think we're using default for this (which is ERROR): log_min_error_statement (enum)
Controls which SQL statements that cause an error condition are recorded in the server log. The current SQL statement is included in the log entry for any message of the specified severity or higher. Valid values are DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, and PANIC. The default is ERROR, which means statements causing errors, log messages, fatal errors, or panics will be logged. To effectively turn off logging of failing statements, set this parameter to PANIC. Only superusers and users with the appropriate SET privilege can change this setting.
So at first glance, it seems if we want to throw exceptions for what is effectively "normal" API calls, we're going to be spamming postgres logs a lot.

wrona
2:38 PM
maybe we should specify log_client_min_messages to cover all errors and on server log only fatal ? Hard to say how it will behave in python env. (generaly it's important for sync)

dan
2:39 PM
So you're saying we can (and probably should) log at different levels for the two types of sessions?
If that is feasible, seems like a workable solution.

wrona
2:42 PM
yes - otherwise we need some higher level logging system (i.e. grafana based developed by gmirek ) where it could be possible log out such errors by using their custom error codes

dan
2:44 PM
Ok, so I guess main problem with above idea (log_client_min_messages+log only fatal on server) is we would have to manually log these errors for each client. And also consider potential ways to aggregate such errors.
I also wonder a bit about how much overhead we may be introducing simply by logging all these call "errors". But hard to say on that point without some analysis.
And for now, probably more obvious slowdowns in python processing would obscure it's relative impact.

More info on this: hivemind currently throws exceptions and lets them leak out to the python code, which catches the exception and rethrows it as a python exception. This is then used to construct a json "error" response.

A quick test in SQL revealed that these exceptions only get logged if they aren't caught. But currently we don't catch them: we let them leak out to python, and that's why a log message is generated.

For the most problematic case (get_content), we could catch the error and return a NULL response since it's not a possible reply to the a call to get_content that finds a post, then the python code could construct the error (we only get one type of error for the most part, so even if other errors are thrown, logging them still would not be a problem). It is likely this solution wouldn't work for calls like get_content_replies which probably returns NULL when their are no comments on a post, but in practice we're not getting spammed by such calls, so it might be an acceptable bandaid for the issue.

A related issue is how we're handling this type of issue with postgrest. Are we catching the error at the postgres level and constructing an error msg, or are we leaking it out to haskell which then constructs the error message? This needs to be investigated.

A bigger problem is actually that this also causes even more spam in hivemind server logs (where whole stacks are getting dumped). And setting to warning level doesn't prevent it as these are errors, not warnings. So I think we need to implement the workaround at the very least.

After done research I found that raised exceptions does not appear in the server log when:

server log_min_messages is set to ERROR (that's our requirement)
set log_min_messages=FATAL to the function at which scope given errors should be ignored.

I did simple test and changed:

CREATE OR REPLACE FUNCTION hivemind_app.find_account_id(
	_account character varying,
	_check boolean)
    RETURNS integer
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
    SET log_min_messages='fatal'
AS $BODY$

Then did a direct call:

select hivemind_app.find_account_id('xxxxxxxxhhhhhsdsddd', TRUE)

and indirect one:

select * from hivemind_app.account_notifications('xxxxxxxxhhhhhsdsddd'::text,0::smallint,0::bigint,0::smallint)

Even I can see errors on local console, they are not logged in postgres log file.

mentioned in merge request !688 (merged)

mentioned in merge request !690 (merged)

AFAIK this task is completed, so closing.

closed

Change "errors" for missing data to "warning" or "info" to avoid postgres logging them

Designs

Child items ...

Activity