Skip to content

Operator doesn't wait for successful queue synchronization on shutdown #1595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ViliusS opened this issue Mar 19, 2024 · 7 comments · Fixed by #1611
Closed

Operator doesn't wait for successful queue synchronization on shutdown #1595

ViliusS opened this issue Mar 19, 2024 · 7 comments · Fixed by #1611
Assignees
Labels
bug Something isn't working

Comments

@ViliusS
Copy link
Contributor

ViliusS commented Mar 19, 2024

Describe the bug

On graceful shutdown currently operator calls rabbitmq-upgrade await_online_quorum_plus_one -t X; rabbitmq-upgrade await_online_synchronized_mirror -t X; rabbitmq-upgrade drain -t X (where X is graceful timeout). It was probably meant to be rabbitmq-upgrade await_online_quorum_plus_one -t X && rabbitmq-upgrade await_online_synchronized_mirror -t X && rabbitmq-upgrade drain -t X so that the drain is executed only after synchronization returns success (0) and the whole preStop hook is processed as one true/false condition.

@ViliusS ViliusS added the bug Something isn't working label Mar 19, 2024
@lukebakken
Copy link
Contributor

Note: do NOT delete or convert this issue to a discussion. I instructed @ViliusS to open this issue, based on this rabbitmq-users thread - https://groups.google.com/g/rabbitmq-users/c/YgP9aTBaRVs

@lukebakken lukebakken self-assigned this Mar 20, 2024
@Zerpet
Copy link
Member

Zerpet commented Mar 26, 2024

I don't think it will make a difference. See for example the hands-on exercise from Kubernetes docs: https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/

Where they set a pre-stop hook for NGINX using multiple commands, separated by ;. The pre-stop hook must complete, meaning that all commands must succeed. If any command within the pre-stop hook fails, it will retry again until all commands succeed.

Do you have evidence that our pre-stop hook is not working as intended?

@ViliusS
Copy link
Contributor Author

ViliusS commented Mar 26, 2024

I cannot verify it on these exact commands, but it should make a difference. This is a simple test with one command returning success (0) and one fake command returning false.

[]$ fake_command;
-bash: fake_command: command not found
[]$ echo $?
127
[]$ touch /tmp
[]$ echo $?
0
[]$ fake_command; touch /tmp
-bash: fake_command: command not found
[]$ echo $?
0
[]$ fake_command && touch /tmp
-bash: fake_command: command not found
[]$ echo $?
127

See how fake_command; touch /tmp still returns 0 even though first command failed.

The difference in Kubernetes' docs is that they use while killall with ; to 1) check for exit code 0, and 2) make a cycle structure. They do it in cycle because nginx -s stop is probably asynchronous.

@Zerpet
Copy link
Member

Zerpet commented Apr 8, 2024

My understanding is that all commands in the preStop hook must succeed, otherwise it will be considered failed. I don't have a strong opinion on this, since the proposal should be equivalent, or fix the logic in case my understanding is incorrect.

@ViliusS
Copy link
Contributor Author

ViliusS commented Apr 8, 2024

preStop hook doesn't have knowledge how many commands there are. All it does is it verifies the exit code of full script, be it on Linux or Windows. Else Kubernetes will need to implement full shell parsing for all OS'es it can run on.
How this works on Linux we already know, i.e. you have to use && if you want the final exit code to be true for all parts of the script.

@lukebakken
Copy link
Contributor

I thought I commented the other day but I guess I didn't hit "Comment"...

Anyway, @ViliusS please feel free to submit a PR to change the preStop hook to use &&. Thanks.

@ViliusS
Copy link
Contributor Author

ViliusS commented Apr 8, 2024

Done. I didn't want to do PR initially because I don't really have Go knowledge or ways to fully test this. Hopefully PR is OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants