-
Notifications
You must be signed in to change notification settings - Fork 38
errors when reading from the mq, possibly blocking '-m fast' restart #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Tomas, thank you very much for reporting! |
Hi Tomas, I'm sorry for returning to this so late.
This seems to be concurrency issue that should be resolved by 4fdf032. Could you, please, recheck?
This is kind of black magic used to place GUCs into shared memory and make them work correctly (normally postgres doesn't allow to do so). In the future that magic should be removed, but it works for now.
Collector process does proc_exit(0) only on SIGTERM, i.e. when worker was shut down by external request. proc_exit(0) seems to be correct behavior for this case. |
Will check. I'm running some other tests on the machine now, but hopefully I'll be able to do some testing next week. |
I repeated the stress test, can't reproduce the original issue anymore even after running it for 8 hourse. |
Hi Alexander,
I've done a review and a bit of testing of the extension today, and I've ran into some strange issues in high-concurrency environments. Essentially, I do have two pgbench tests running at the same time:
a regular pgbench with 72 clients, using the standard workload (so "pgbench -c 72 ...")
a pgbench reading the collected wait data, essentially running this custom SQL script (16 clients)
select count() from pg_wait_sampling_current;
select count() from pg_wait_sampling_history;
select count(*) from pg_wait_sampling_profile;
After a short while, I get these errors in the second pgbench:
What's worse, running "pg_ctl restart" on the cluster times out - there's no CPU or I/O activity, the cluster should restart without any issue, but I suppose there are some locking issues or so, caused by the mq read failures.
Regarding the code - I'm not sure what is the purpose of setup_gucs(). Why not to simply define the GUC variables? If anything, get_guc_variables() is only meant to be used in help_config.c (per comment in guc.c).
Also, should the bgworker main method really do proc_exit(1) instead of proc_exit(0)? At least that's what the other workers I've seen do.
The text was updated successfully, but these errors were encountered: