cardano-db-sync container doesn't gracefully shut down #1945

TrevorBenson · 2025-02-19T20:59:47Z

OS
Your OS: Ubuntu & RockyLinux

Versions
The db-sync version (eg cardano-db-sync --version): 13.6.0.4
PostgreSQL version: 15.10

Build/Install Method
The method you use to build or install cardano-db-sync: ghcr.io/intersectmbo/cardano-db-sync:13.6.0.4 container image.

Run method
The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): [docker|podman] run or by creating Quadlet (podman-systemd) based unit files.

Problem Report
The cardano-db-sync Docker container doesn't shut down gracefully when docker stop is used. Instead of responding to SIGTERM, it eventually gets killed by SIGKILL after the 10-second grace period. This may lead to unclean shutdowns.

Expected behavior

The container should gracefully shut down upon receiving SIGTERM. This is typically done by the application responding to a SIGINT signal.

Current behavior

The cardano-db-sync process inside the container doesn't respond to SIGTERM. The scripts that launch cardano-db-sync don't forward signals or handle shutdown requests, so the main process never receives a SIGINT, and the container is forcibly killed.

Additional context

The TL;DR breakdown:

# podman top preview-db-sync 
USER        PID         PPID        %CPU        ELAPSED              TTY         TIME        COMMAND
root        1           0           0.000       20h12m56.722117998s  ?           0s          /nix/store/izpf49b74i15pcr9708s3xdwyqs4jxwl-bash-5.2p32/bin/bash /nix/store/4k0flcssdaway1pzs39valv7278r585s-cardano-db-sync-preview/bin/cardano-db-sync-preview 
root        6           1           4.659       20h12m56.722279625s  ?           56m31s      /nix/store/8yrmmrdl0zgpzish0pfcji3630lgqrmx-cardano-db-sync-exe-cardano-db-sync-13.6.0.4/bin/cardano-db-sync --config /nix/store/f3mggncdz0284z9cykzv8nd1ccq31n7i-db-sync-config.json --socket-path /node-ipc/node.socket --schema-dir /nix/store/npsidz34y67jp7sc07b2iw7s2n3fp9lj-schema --state-dir /var/lib/cexplorer

The container entrypoint uses exec when calling the cardano-db-sync-${network} bash script.

cardano-db-sync/nix/docker.nix

Lines 83 to 86 in 0f1d93f

    
                         elif [[ "$NETWORK" == "${env}" ]]; then 
        
                           echo "Connecting to network: ${env}" 
        
                           exec ${dbSyncScript}/bin/${dbSyncScript.name} 
        
                           echo "Cleaning up"

The cardano-db-sync-${network} bash script:

Does not use execso it becomes the primary process of the container & does not use trap to pass along the signals to the cardano-db-sync process it starts.

cardano-db-sync/nix/docker.nix

Lines 94 to 96 in 0f1d93f

    
                     #!${runtimeShell} 
        
                     set -euo pipefail 
        
                     ${service.script} $@

The cardano-db-sync binary does not appear to handle SIGTERM.
Even correcting the stop signal via --stop-signal=SIGINT when creating the container would not change the behavior, is it would only reach the bash wrapper script for the given network/environment, not the binary.

To Reproduce

Run the cardano-db-sync Docker container.
Issue [docker|podman] stop <container_id>.
Observe that:
- With docker the container stop takes 10 seconds, reaching its stop timeout, so SIGKILL would be sent.
- With podman the container stop returns a warning after 10 seconds, notifying the user it resorts to SIGKILL.
```
WARN[0010] StopSignal SIGETERM failed to stop container cardano-db-sync in 10 seconds, resorting to SIGKILL.
```

The text was updated successfully, but these errors were encountered:

TrevorBenson · 2025-02-19T21:22:40Z

This might relate to #1010 which seems to reference Docker containers quite often and the stop should also be receiving a SIGKILL event. Although, I'm unsure if the volatile data handling (or anything else) is significantly different when receiving SIGINT vs. SIGKILL.

The Docker image is built using Nix's dockerTools.buildImage, which currently doesn't appear to support setting the STOPSIGNAL instruction (unless this is just an undocumented feature).

If setting the STOPSIGNAL was possible I would normally suggest the Dockerfile use STOPSIGNAL SIGINT and simply having line 96 use an exec ${service.script} $@ (similar to the entrypoint which called exec ${dbSyncScript}/bin/${dbSyncScript.name}):

cardano-db-sync/nix/docker.nix

Lines 94 to 96 in 0f1d93f

    
                     #!${runtimeShell} 
        
                     set -euo pipefail 
        
                     ${service.script} $@

The next thing that comes to mind would be signal handling in the wrapper script (i.e. ${dbSyncScript.name}). Maybe something like:

https://github.com/TrevorBenson/cardano-db-sync/blob/9fb8a6b9c470fca73ee5dab9938b71b5468c87aa/nix/docker.nix#L94-L117

Since permalink isn't embedding here is a copy:

          #!${runtimeShell}
          set -euo pipefail

          handle_sigterm() {
            echo "SIGTERM received. Sending SIGINT to process..."
            if [[ -n "$process_id" ]]; then
              kill -INT "$process_id" 2>/dev/null
              echo "SIGINT sent. Exiting."
            else
              echo "No process id found. Exiting."
              exit 1
            fi
            exit 0
          }

          trap handle_sigterm TERM

          ${service.script} $@ &

          process_id="$!"

          while true; do
            sleep 1  # Small sleep to avoid busy-waiting
          done

If this is a welcome approach I can open a PR based off this branch.

TrevorBenson · 2025-02-20T21:37:21Z

I took a closer look at the code for dockerTools.buildImage and realized I had overlooked config as a method to add any instruction to the build. Unless keeping the ${dbSyncScript.name} as PID 1 is preferred for some reason then I believe

cardano-db-sync/nix/docker.nix

Lines 25 to 28 in 0f1d93f

    
           baseImage = dockerTools.buildImage { 
        
             name = "cardano-db-sync-base-env"; 
        
             config.Env = [ "NIX_SSL_CERT_FILE=${cacert}/etc/ssl/certs/ca-bundle.crt" ];

Could have line 27 modified to:

  config = {
    Env = [ "NIX_SSL_CERT_FILE=${cacert}/etc/ssl/certs/ca-bundle.crt" ];
    StopSignal = "SIGINT";
  };

to set the STOPSIGNAL instruction, and then if line 96 included exec the PID 1 inside the container would be the cardano-db-sync binary process which does a graceful shutdown when receiving an interrupt signal.

If I didn't get the syntax wrong I would happily adjust my branch and open a PR so the containers now get gracefully shutdown and no longer reach the default (or set by --stop-timeout) and kill the process.

sgillespie · 2025-02-21T14:17:53Z

If I didn't get the syntax wrong I would happily adjust my branch and open a PR so the containers now get gracefully shutdown and no longer reach the default (or set by --stop-timeout) and kill the process.

This looks good to me. I'll do a bit of testing in the meantime. PS: It looks like you need to set up commit signing.

TrevorBenson · 2025-02-21T16:04:09Z

This looks good to me. I'll do a bit of testing in the meantime. PS: It looks like you need to set up commit signing.

New laptop, I must not have migrated my global git config. I'll sign and force push shortly.

TrevorBenson · 2025-02-21T23:46:19Z

@sgillespie Commit is now signed.

TrevorBenson added the bug Something isn't working label Feb 19, 2025

TrevorBenson mentioned this issue Feb 21, 2025

Implement a graceful container shutdown #1946

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cardano-db-sync container doesn't gracefully shut down #1945

cardano-db-sync container doesn't gracefully shut down #1945

TrevorBenson commented Feb 19, 2025

TrevorBenson commented Feb 19, 2025 •

edited

Loading

TrevorBenson commented Feb 20, 2025 •

edited

Loading

sgillespie commented Feb 21, 2025

TrevorBenson commented Feb 21, 2025

TrevorBenson commented Feb 21, 2025

cardano-db-sync container doesn't gracefully shut down #1945

cardano-db-sync container doesn't gracefully shut down #1945

Comments

TrevorBenson commented Feb 19, 2025

TrevorBenson commented Feb 19, 2025 • edited Loading

TrevorBenson commented Feb 20, 2025 • edited Loading

sgillespie commented Feb 21, 2025

TrevorBenson commented Feb 21, 2025

TrevorBenson commented Feb 21, 2025

TrevorBenson commented Feb 19, 2025 •

edited

Loading

TrevorBenson commented Feb 20, 2025 •

edited

Loading