Skip to content

cardano-db-sync container doesn't gracefully shut down #1945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TrevorBenson opened this issue Feb 19, 2025 · 5 comments
Open

cardano-db-sync container doesn't gracefully shut down #1945

TrevorBenson opened this issue Feb 19, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@TrevorBenson
Copy link
Contributor

OS
Your OS: Ubuntu & RockyLinux

Versions
The db-sync version (eg cardano-db-sync --version): 13.6.0.4
PostgreSQL version: 15.10

Build/Install Method
The method you use to build or install cardano-db-sync: ghcr.io/intersectmbo/cardano-db-sync:13.6.0.4 container image.

Run method
The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): [docker|podman] run or by creating Quadlet (podman-systemd) based unit files.

Problem Report
The cardano-db-sync Docker container doesn't shut down gracefully when docker stop is used. Instead of responding to SIGTERM, it eventually gets killed by SIGKILL after the 10-second grace period. This may lead to unclean shutdowns.

Expected behavior

The container should gracefully shut down upon receiving SIGTERM. This is typically done by the application responding to a SIGINT signal.

Current behavior

The cardano-db-sync process inside the container doesn't respond to SIGTERM. The scripts that launch cardano-db-sync don't forward signals or handle shutdown requests, so the main process never receives a SIGINT, and the container is forcibly killed.

Additional context

The TL;DR breakdown:

# podman top preview-db-sync 
USER        PID         PPID        %CPU        ELAPSED              TTY         TIME        COMMAND
root        1           0           0.000       20h12m56.722117998s  ?           0s          /nix/store/izpf49b74i15pcr9708s3xdwyqs4jxwl-bash-5.2p32/bin/bash /nix/store/4k0flcssdaway1pzs39valv7278r585s-cardano-db-sync-preview/bin/cardano-db-sync-preview 
root        6           1           4.659       20h12m56.722279625s  ?           56m31s      /nix/store/8yrmmrdl0zgpzish0pfcji3630lgqrmx-cardano-db-sync-exe-cardano-db-sync-13.6.0.4/bin/cardano-db-sync --config /nix/store/f3mggncdz0284z9cykzv8nd1ccq31n7i-db-sync-config.json --socket-path /node-ipc/node.socket --schema-dir /nix/store/npsidz34y67jp7sc07b2iw7s2n3fp9lj-schema --state-dir /var/lib/cexplorer 
  1. The container entrypoint uses exec when calling the cardano-db-sync-${network} bash script.
    elif [[ "$NETWORK" == "${env}" ]]; then
    echo "Connecting to network: ${env}"
    exec ${dbSyncScript}/bin/${dbSyncScript.name}
    echo "Cleaning up"
  2. The cardano-db-sync-${network} bash script:
    • Does not use execso it becomes the primary process of the container & does not use trap to pass along the signals to the cardano-db-sync process it starts.
      #!${runtimeShell}
      set -euo pipefail
      ${service.script} $@
  3. The cardano-db-sync binary does not appear to handle SIGTERM.
  4. Even correcting the stop signal via --stop-signal=SIGINT when creating the container would not change the behavior, is it would only reach the bash wrapper script for the given network/environment, not the binary.

To Reproduce

  1. Run the cardano-db-sync Docker container.
  2. Issue [docker|podman] stop <container_id>.
  3. Observe that:
    • With docker the container stop takes 10 seconds, reaching its stop timeout, so SIGKILL would be sent.
    • With podman the container stop returns a warning after 10 seconds, notifying the user it resorts to SIGKILL.
      WARN[0010] StopSignal SIGETERM failed to stop container cardano-db-sync in 10 seconds, resorting to SIGKILL.
      
@TrevorBenson TrevorBenson added the bug Something isn't working label Feb 19, 2025
@TrevorBenson
Copy link
Contributor Author

TrevorBenson commented Feb 19, 2025

This might relate to #1010 which seems to reference Docker containers quite often and the stop should also be receiving a SIGKILL event. Although, I'm unsure if the volatile data handling (or anything else) is significantly different when receiving SIGINT vs. SIGKILL.


The Docker image is built using Nix's dockerTools.buildImage, which currently doesn't appear to support setting the STOPSIGNAL instruction (unless this is just an undocumented feature).

If setting the STOPSIGNAL was possible I would normally suggest the Dockerfile use STOPSIGNAL SIGINT and simply having line 96 use an exec ${service.script} $@ (similar to the entrypoint which called exec ${dbSyncScript}/bin/${dbSyncScript.name}):

#!${runtimeShell}
set -euo pipefail
${service.script} $@

The next thing that comes to mind would be signal handling in the wrapper script (i.e. ${dbSyncScript.name}). Maybe something like:

https://github.com/TrevorBenson/cardano-db-sync/blob/9fb8a6b9c470fca73ee5dab9938b71b5468c87aa/nix/docker.nix#L94-L117

Since permalink isn't embedding here is a copy:

          #!${runtimeShell}
          set -euo pipefail

          handle_sigterm() {
            echo "SIGTERM received. Sending SIGINT to process..."
            if [[ -n "$process_id" ]]; then
              kill -INT "$process_id" 2>/dev/null
              echo "SIGINT sent. Exiting."
            else
              echo "No process id found. Exiting."
              exit 1
            fi
            exit 0
          }

          trap handle_sigterm TERM

          ${service.script} $@ &

          process_id="$!"

          while true; do
            sleep 1  # Small sleep to avoid busy-waiting
          done

If this is a welcome approach I can open a PR based off this branch.

@TrevorBenson
Copy link
Contributor Author

TrevorBenson commented Feb 20, 2025

I took a closer look at the code for dockerTools.buildImage and realized I had overlooked config as a method to add any instruction to the build. Unless keeping the ${dbSyncScript.name} as PID 1 is preferred for some reason then I believe

baseImage = dockerTools.buildImage {
name = "cardano-db-sync-base-env";
config.Env = [ "NIX_SSL_CERT_FILE=${cacert}/etc/ssl/certs/ca-bundle.crt" ];

Could have line 27 modified to:

  config = {
    Env = [ "NIX_SSL_CERT_FILE=${cacert}/etc/ssl/certs/ca-bundle.crt" ];
    StopSignal = "SIGINT";
  };

to set the STOPSIGNAL instruction, and then if line 96 included exec the PID 1 inside the container would be the cardano-db-sync binary process which does a graceful shutdown when receiving an interrupt signal.

If I didn't get the syntax wrong I would happily adjust my branch and open a PR so the containers now get gracefully shutdown and no longer reach the default (or set by --stop-timeout) and kill the process.

@sgillespie
Copy link
Contributor

If I didn't get the syntax wrong I would happily adjust my branch and open a PR so the containers now get gracefully shutdown and no longer reach the default (or set by --stop-timeout) and kill the process.

This looks good to me. I'll do a bit of testing in the meantime. PS: It looks like you need to set up commit signing.

@TrevorBenson
Copy link
Contributor Author

This looks good to me. I'll do a bit of testing in the meantime. PS: It looks like you need to set up commit signing.

New laptop, I must not have migrated my global git config. I'll sign and force push shortly.

@TrevorBenson
Copy link
Contributor Author

@sgillespie Commit is now signed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants