-
Notifications
You must be signed in to change notification settings - Fork 38
profile_pid (and profile_queries) GUCs are reset to defaults when a parallel query is run #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I can reproduce it well enough using this dump: https://github.com/credativ/omdb-postgresql/releases/download/2022-10-18/omdb.dump and those pgbench scripts: https://github.com/credativ/omdb-postgresql/tree/master/pgbench Preparation (on Debian, using the
Now if I run
|
@mbanck-ntap Thank you for the report! I've reproduced the bug using your instructions. It's not related to pg_wait_sampling_reset_profile(). It has to do with parallel workers and pg_wait_sampling storing GUCs in shared memory. Storing GUC variables in shared memory looked weird, but thanks to your report, we now know that it's completely broken. During initialization a parallel worker temporarily resets GUCs to their defaults and then sets them to the values received from the leader (see RestoreGUCState). So there is a window during which pg_wait_sampling GUCs in shared memory are reset by a parallel worker to their defaults. That's already a problem. Further, If during this window another session launches a parallel worker, it will receive wrong current values (the defaults) from the leader and will permanently set them. If I set max_parallel_workers to 0, I no longer observe the bug. Could you confirm this? To check my reasoning I added a delay in RestoreGUCState and reproduced the bug with just two sessions concurrently launching parallel workers. diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0831d45aba3..1f9bf94ef5f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6303,6 +6303,8 @@ RestoreGUCState(void *gucstate)
InitializeOneGUCOption(gconf);
}
+ pg_usleep(3000000L);
+
/* First item is the length of the subsequent data */
memcpy(&len, gucstate, sizeof(len));
repro.sh: psql -c 'show pg_wait_sampling.profile_pid'
psql -c 'set debug_parallel_query = on; select count(*) from pg_class' &
sleep 1
psql -c 'set debug_parallel_query = on; select count(*) from pg_class'
wait
psql -c 'show pg_wait_sampling.profile_pid' Output:
It's an ancient bug, maybe from the origin. We'll try to fix it. And I think we should just stop using shared memory for the GUCs. Thanks again! |
Ah yes, with Also, setting Regarding |
We have discussed this issue and the only way that we found to remove this behaviour with parallel workers is to move GUC variables from shared memory to local process memory as they are in every other place This will bring the following changes in behaviour:
Having GUCs in shared memory is fun but we prefer robust solutions that do not deviate from standard usage rules of GUCs If you have any objections (e.g. you actively use SETting pg_wait_sampling GUCs in your scripts) feel free to join the discussion |
Sounds good; at least we don't need to |
Right now, even using SET, you change collector's behaviour for ALL sessions - for the views and for the collected statistics. I think that even SET LOCAL affects everything right now. With proposed change we will have global change of GUC value only in "global" cases (re-reading config) and not in the "local-est" ones (using SET [LOCAL]) |
Well in that case, changing it to |
For more context and reasoning see issue #85
For more context and reasoning see issue #85
For more context and reasoning see issue #85
For more context and reasoning see issue #85
For more context and reasoning see issue #85
For more context and reasoning see issue #85
@mbanck-ntap Thank you for testing! I've tagged a release: https://github.com/postgrespro/pg_wait_sampling/releases/tag/v1.1.7 |
One of our customers complained that POWA's web interface was getting very slow to respond and they tracked it down to slow queries being done on the POWA database, which in turn were correlated with very large pg_wait_sampling data. POWA recommends turning off
pg_wait_sampling.profile_pid
(see https://powa.readthedocs.io/en/latest/components/stats_extensions/pg_wait_sampling.html#configuration), which they did.What they noticed though, was that even after turning off
profile_pid
, it was set back toon
later on. They also mentioned that they ran thepg_wait_sampling_reset_profile()
function multiple times to keep the POWA web interface responsive.We could reproduce the problem that
profile_pid
is reset toon
(the default) directly after callingpg_wait_sampling_reset_profile()
, but only once - After reloading the PostgreSQL configuration, the GUC was back tooff
but would then agan be set toon
whenpg_wait_sampling_reset_profile()
is called. However, we have not managed to find a way to reliably reproduce this from an empty instance or database.We had a look at the code but so far did not see an obvious issue with how the configuration parameters are set, so are just opening this issue in case this rings a bell somewhere, or somebody knows what is going on and can fix it. We will continue to diagnose the problem next week.
The text was updated successfully, but these errors were encountered: