feat: adding graceful shutdown for druid process #23

sydefz · 2024-05-22T09:52:47Z

Description of changes:

Sometimes we can observe after stopping historical the host gets terminated immediately, this causes the in-flight requests to fail and results in 500s.

This PR adds a 30s sleep after the historical/query/master stop, so it has enough time to gracefully shutdown the process before terminating the host. Also updates the supervisord config to wait for 30s before sending the KILL signal.

I chose 30s because the default druid.server.http.gracefulShutdownTimeout is PT30s (https://druid.apache.org/docs/latest/configuration/#historical-query-configs)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

dougtoppin · 2024-05-26T16:24:10Z

@sydefz Thank you for your PR. We will take a look at it and get back to you.

sydefz · 2024-06-01T07:04:16Z

hi @dougtoppin any updates on this?

dougtoppin · 2024-06-03T00:17:20Z

@sydefz Sorry about the delay. We have been working on some other tasks. We will get to this soon.
Thanks for your submission.

msalman-atl · 2024-06-06T03:09:23Z

source/lib/uploads/scripts/druid/terminate_druid_node.sh

@@ -109,6 +109,8 @@ waitForProcess() {
        else
            echo "The new node is up. Stopping old node..."
            $SUPERVISORCTL_CMD stop $process_name
+            # wait gracefulShutdownTimeout for 30 seconds
+            sleep 30


Can we pass in stopwaitsecs from node config into this script instead of hardcoding it here?

Yes that's a good idea, eventually we should avoid hard code, make it configurable and have the flexibility to handle shorter or longer graceful shutdown periods.

I guess the priority and purpose of this PR is to ensure we don't throw 500s during deployment, I'm happy to extend this to make it flexible later.

msalman-atl · 2024-06-06T03:11:41Z

source/lib/config/user_data/data_user_data

@@ -58,6 +58,7 @@ user=${USER_NAME}
 autorestart=true
 redirect_stderr=true
 stdout_logfile=/var/log/supervisor/historical.log
+stopwaitsecs=30


Can we put this in common_user_data so we don't have to repeat it for each individual service?

the supervisord config for each druid component is generated in individual user_data. They are slightly different, bringing these customisations into common_user_data would defeat the purpose of having common.

I'd keep it like this for now.

van-vothanh · 2024-06-06T08:46:35Z

thanks for the submission @sydefz
I've verified it with our solution pipeline and everything is working 👍

sydefz added 2 commits May 22, 2024 19:46

feat: adding graceful shutdown for historical

255562a

Add up to 30s wait on supervisord stop

5581ad0

sydefz changed the title ~~feat: adding graceful shutdown for historical~~ feat: adding graceful shutdown for druid process May 22, 2024

msalman-atl reviewed Jun 6, 2024

View reviewed changes

van-vothanh self-requested a review June 6, 2024 08:45

van-vothanh approved these changes Jun 6, 2024

View reviewed changes

dougtoppin mentioned this pull request Jul 3, 2024

Update to version v1.0.1 #25

Merged

fhoueto-amz mentioned this pull request Jul 3, 2024

Update to version v1.0.1 #26

Merged

fhoueto-amz closed this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adding graceful shutdown for druid process #23

feat: adding graceful shutdown for druid process #23

sydefz commented May 22, 2024 •

edited

Loading

dougtoppin commented May 26, 2024

sydefz commented Jun 1, 2024

dougtoppin commented Jun 3, 2024

msalman-atl Jun 6, 2024

sydefz Jun 6, 2024

msalman-atl Jun 6, 2024

sydefz Jun 6, 2024

van-vothanh commented Jun 6, 2024

feat: adding graceful shutdown for druid process #23

feat: adding graceful shutdown for druid process #23

Conversation

sydefz commented May 22, 2024 • edited Loading

dougtoppin commented May 26, 2024

sydefz commented Jun 1, 2024

dougtoppin commented Jun 3, 2024

msalman-atl Jun 6, 2024

Choose a reason for hiding this comment

sydefz Jun 6, 2024

Choose a reason for hiding this comment

msalman-atl Jun 6, 2024

Choose a reason for hiding this comment

sydefz Jun 6, 2024

Choose a reason for hiding this comment

van-vothanh commented Jun 6, 2024

sydefz commented May 22, 2024 •

edited

Loading