liveness.sh
Overview
`liveness.sh` is a Bash script designed to serve as a Kubernetes **liveness probe** for the `op-node` (Optimism Node) service. Its main purpose is to monitor whether the node is actively processing blockchain data by verifying that both Layer 1 (L1) and Layer 2 (L2) block numbers are progressing over time. If the node appears stalled (i.e., block numbers are not increasing), the script will exit with a failure status, prompting Kubernetes to restart the container.
The script also supports a mechanism to **disable the liveness check** by the presence of a specific file, allowing manual override when needed.
Detailed Explanation
Script Workflow
Disable Check
The script first checks if the file/data/disable_livenessexists. If present, it outputs"liveness probe disabled"and exits successfully (exit 0), effectively skipping the liveness check.Fetch Sync Status
It sends a JSON-RPC request to the local Optimism node endpoint (http://localhost:9545) to retrieve the current synchronization status:Calls method
optimism_syncStatus.Expects JSON response containing L1 and L2 block numbers.
Parse Block Numbers
Usingjq, the script extracts:current_l1.number— the current Layer 1 block number.unsafe_l2.number— the current Layer 2 block number.
Store and Compare Block Numbers
Stores the current block numbers in
/data/.block_numberas a JSON object.If the file does not exist (first run), it creates the file and exits with code
1to signal a failure (forcing a restart or re-check).On subsequent runs, it reads the previous block numbers and compares:
Both L1 and L2 block numbers must be strictly increasing compared to the previous values.
If yes, it prints
"op-node is running"and exits0(success).Otherwise, prints
"op-node is stalled"and exits1(failure).
Key Implementation Details
Disabling the probe: Presence of
/data/disable_livenessprovides a manual override.Persistence of state: Uses a hidden file
/data/.block_numberto store last known block numbers across probe invocations.JSON-RPC call: Uses
curlto call the Optimism node’s JSON-RPC endpoint andjqto parse JSON output.Strict block progress check: Both L1 and L2 blocks must have increased since the last probe; if either hasn't, the node is considered stalled.
Exit codes:
0indicates healthy node (liveness probe success).1indicates failure (node stalled or unable to get status), triggering Kubernetes restart.
Usage Examples
Typical Usage in Kubernetes
This script is intended to be run periodically by Kubernetes as a liveness probe:
livenessProbe:
exec:
command:
- /bin/bash
- /path/to/liveness.sh
initialDelaySeconds: 30
periodSeconds: 10
This setup will invoke the script every 10 seconds after an initial delay of 30 seconds. If the script exits with a non-zero code, Kubernetes restarts the pod.
Manual Disable
To temporarily disable the liveness probe without removing or modifying Kubernetes settings, create the disable file inside the container’s `/data` directory:
touch /data/disable_liveness
The next probe run will exit successfully without checking sync status.
Interaction with Other System Components
Optimism Node (
op-node): The script interacts directly with the node’s HTTP JSON-RPC interface on port 9545 to check block synchronization status.Kubernetes: The script is designed to be used as a liveness probe command within a Kubernetes Pod spec, influencing pod lifecycle management.
Persistent Storage: Uses the
/datadirectory to store state files (disable_livenessand.block_number). This directory should be backed by a persistent volume or emptyDir mount to maintain state between probe runs.
Visual Diagram
The following flowchart illustrates the decision flow and key functional steps within `liveness.sh`:
flowchart TD
Start([Start]) --> CheckDisable{File /data/disable_liveness exists?}
CheckDisable -- Yes --> Disabled[Print "liveness probe disabled"\nExit 0]
CheckDisable -- No --> FetchStatus[Fetch sync status via JSON-RPC]
FetchStatus --> ParseBlocks[Parse current L1 and L2 block numbers]
ParseBlocks --> CheckFile{File /data/.block_number exists?}
CheckFile -- No --> CreateFile[Write current block numbers to file\nExit 1]
CheckFile -- Yes --> ReadPrev[Read previous block numbers from file]
ReadPrev --> CompareBlocks{Current L1 > Previous L1\nAND Current L2 > Previous L2?}
CompareBlocks -- Yes --> Healthy[Print "op-node is running"\nExit 0]
CompareBlocks -- No --> Stalled[Print "op-node is stalled"\nExit 1]
CreateFile --> End([End])
Disabled --> End
Healthy --> End
Stalled --> End
Summary
`liveness.sh` is a simple yet effective monitoring script that:
Determines if an Optimism node is actively syncing blockchain blocks.
Uses persistent storage to detect stalled progress.
Is designed to integrate seamlessly with Kubernetes liveness probes.
Supports manual disabling for maintenance or troubleshooting.
This script plays a critical role in maintaining the health and availability of the `op-node` service within a containerized environment.