graceful-shutdown.yaml
Overview
This Ansible playbook file orchestrates the graceful shutdown of a Docker container named according to the pattern node{{ NODE_ID }} within a specified directory ({{ BK_DIR }}). The playbook executes a sequence of shell commands to terminate the node process running inside the container, monitor its shutdown status by analyzing logs, and forcibly stop the container if the process does not terminate within a given timeout.
This file is primarily used to ensure that the node container is stopped cleanly, allowing any internal shutdown routines to complete before the container is halted, minimizing risk of data corruption or inconsistent states.
Playbook Tasks
1. Send graceful shutdown request
- name: Send graceful shutdown request
ansible.builtin.shell:
chdir: "{{ BK_DIR }}"
cmd: docker compose exec "node{{ NODE_ID }}" pkill node
ignore_errors: true
Purpose: Sends the
pkill nodecommand inside the targeted node container to initiate a graceful shutdown of the node process.Parameters:
chdir: "{{ BK_DIR }}"— changes the working directory to the base directory where docker-compose files are located.cmd: docker compose exec "node{{ NODE_ID }}" pkill node— executes thepkillcommand inside the container to signal the node process for termination.
Behavior: The task ignores errors to proceed even if the process is not found or already stopped.
Usage Example:
Initiate shutdown on
node5container whenNODE_IDis set to 5.
Implementation Detail: Uses
docker compose execto run commands inside running containers, referencing thenode{{ NODE_ID }}container dynamically.
2. Wait for container to stop or shutdown to finish
- name: Wait for container to stop or shutdown to finish
ansible.builtin.shell:
chdir: "{{ BK_DIR }}"
cmd: |
(docker compose ps "node{{ NODE_ID }}" -q | wc -l | grep -q "0") || (tail -n "{{ NODE_STOP_TEST_OUTER_TAIL }}" "{{ BK_LOGS_DIR }}/node.log" | grep -v -e TRACE -e pub_sub | tail -n "{{ NODE_STOP_TEST_INNER_TAIL }}" | grep -q "monit: Shutdown finished")
register: container_check
until: container_check is success
retries: "{{ GS_WAIT | default(WAIT_FOR_NODE_STOP_SECS) }}"
delay: 1
when: GS_WAIT is not defined or GS_WAIT
Purpose: Polls repeatedly until the node container has stopped or the log file contains a specific shutdown completion message.
Parameters:
cmd: Complex shell command that:Checks if the container is no longer running by verifying if
docker compose psreturns zero container IDs.If still running, checks the node's log file for the "monit: Shutdown finished" message, filtering out lines containing
TRACEorpub_sub.
register: container_check— captures the command result for conditional retries.until: container_check is success— repeats the task until the condition succeeds.retriesanddelaycontrol the maximum wait time and polling interval; configurable viaGS_WAITorWAIT_FOR_NODE_STOP_SECS.whencondition ensures this wait only happens ifGS_WAITis undefined or true.
Return Value: Success when the container is stopped or shutdown message appears.
Implementation Detail: Uses shell piping and filtering with
tail,grep, and counting commands to efficiently monitor shutdown progress without requiring additional tooling.Usage Example: Waits up to defined timeout seconds for shutdown before proceeding.
3. Stop node container if the process still lingering
- name: Stop node container if the process still lingering
ansible.builtin.shell:
chdir: "{{ BK_DIR }}"
cmd: docker compose stop "node{{ NODE_ID }}" -t 5
Purpose: Forcefully stops the node container if it has not exited after the graceful shutdown attempt and waiting period.
Parameters:
cmd: docker compose stop "node{{ NODE_ID }}" -t 5— sends a stop command to the container with a 5-second timeout.
Behavior: Ensures cleanup of lingering containers to prevent resource leaks.
Usage Example: Stops
node3container after failed graceful shutdown.Interaction: This task is the final step in the shutdown workflow, guaranteeing container termination.
Implementation Details and Algorithms
The shutdown process uses a combination of signaling (
pkill node), log file monitoring, and container state checking.It leverages
docker compose execto interact with running containers anddocker compose psto check container status.Log checking avoids noisy entries by excluding certain log levels (
TRACE,pub_sub) to focus on shutdown-related messages.Retry logic with delay and conditional execution implements a wait mechanism for asynchronous shutdown completion.
Interaction with Other System Components
This playbook interacts with Docker containers managed by
docker composewithin the directory{{ BK_DIR }}.It reads node log files located at
{{ BK_LOGS_DIR }}/node.logto detect shutdown completion.Variables such as
NODE_ID,GS_WAIT, and timeout constants (WAIT_FOR_NODE_STOP_SECS) are expected to be defined in the playbook or inventory, integrating with the broader configuration and orchestration framework.The shutdown procedure impacts the node service lifecycle and is typically triggered during deployment updates, maintenance, or scaling operations.
Visual Diagram
flowchart TD
A[Send graceful shutdown request] --> B[Wait for container to stop or shutdown to finish]
B -->|Shutdown detected| C[End]
B -->|Timeout expired| D[Stop node container forcibly]
D --> C
Description: This flowchart represents the sequential tasks in the shutdown process:
Initiate graceful shutdown command.
Poll for shutdown completion or container stop.
If shutdown not detected within the timeout, forcibly stop the container.
End of shutdown sequence.