liveness.sh
Overview
`liveness.sh` is a lightweight Bash script designed to serve as a **liveness probe** for an Ethereum node running inside a Kubernetes pod. Its primary function is to verify that the Ethereum node daemon is actively progressing by confirming that the node’s current block number increases over time. This ensures Kubernetes can detect if the node has stalled and take corrective action such as restarting the container.
The script interacts with the Ethereum JSON-RPC interface (`eth_blockNumber` method) exposed locally (typically on `http://localhost:8545`), compares the current block number to a previously recorded value stored on disk, and determines if the node is alive (block height advancing) or stalled (no block progress).
It also supports a manual override mechanism via a disable flag file that allows temporarily bypassing the liveness check, useful during maintenance or debugging.
Detailed Explanation
Script Behavior Summary
Disable Flag Check
Checks if the file/data/disable_livenessexists. If yes, the script prints"liveness probe disabled"and exits successfully (exit 0), skipping further checks.Fetch Current Block Number
Queries the Ethereum node JSON-RPC endpoint athttp://localhost:8545for the current block number using theeth_blockNumberRPC method.Block Number Persistence and Comparison
Reads the last recorded block number from
/data/.block_number.If the file does not exist, writes the current block number to it and exits with failure (
exit 1). This triggers Kubernetes to consider the node not yet alive on the first probe.If the file exists, compares the current block number with the previous one.
If the current block number is greater, the node is considered alive (
exit 0).Otherwise, the node is considered stalled (
exit 1).
File-level Variables
Variable Name | Description |
|---|---|
`DISABLE_LIVENESS_PROBE` | Path to the disable flag file `/data/disable_liveness`. If this file exists, liveness checking is bypassed. |
`FILE` | Path to the file `/data/.block_number` storing the last observed Ethereum block number. |
Key Commands and Logic
Disable Check
if [[ -f "$DISABLE_LIVENESS_PROBE" ]]; then echo "liveness probe disabled" exit 0 fiThis allows manual disabling of the probe by creating the
/data/disable_livenessfile.Fetch Current Block Number with curl and jq
ETH_BLOCK_NUMBER=$(curl -sf -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ -H 'Content-Type: application/json' http://localhost:8545) || exit 1 CURRENT_BLOCK_NUMBER_HEX=$(echo $ETH_BLOCK_NUMBER | jq -r '.result') CURRENT_BLOCK_NUMBER=$(($CURRENT_BLOCK_NUMBER_HEX))Uses
curlwith silent fail (-sf) to POST a JSON-RPC request.Parses the JSON response with
jqto extract the hex string representing the block number.Converts the hex string to a decimal integer for comparison.
Initial File Creation and Exit
if [[ ! -f "$FILE" ]]; then echo $CURRENT_BLOCK_NUMBER > $FILE exit 1 fiOn first run (or if file missing), stores the current block number and exits with failure so Kubernetes does not treat the node as alive immediately.
Block Number Comparison
PREVIOUS_BLOCK_NUMBER=$(cat $FILE) echo $CURRENT_BLOCK_NUMBER > $FILE if (( $CURRENT_BLOCK_NUMBER > $PREVIOUS_BLOCK_NUMBER )); then echo "daemon is running" exit 0 fi echo "daemon is stalled" exit 1If the block number advanced, the daemon is alive. Otherwise, it's stalled.
Usage Example
This script is typically referenced in the Kubernetes pod spec for the Ethereum node container as the liveness probe command:
livenessProbe:
exec:
command:
- /bin/bash
- /path/to/liveness.sh
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 3
Kubernetes executes the script periodically.
If the script exits with code
0, Kubernetes considers the pod healthy.If the script exits with
1, Kubernetes marks the pod as unhealthy and may restart it.If
/data/disable_livenessexists, the probe always succeeds.
Implementation Details and Algorithms
Use of Ethereum JSON-RPC: The script relies on
eth_blockNumber, a standard Ethereum JSON-RPC call that returns the latest block number in hexadecimal format. This is a low-overhead call suitable for quick liveness checks.Persistent State via File: To detect progression, the script writes the last observed block number to a file in
/data/.block_number. This allows comparing across probe invocations.Exit Codes for Kubernetes Integration: The script uses standard exit codes (
0for success,1for failure) which Kubernetes interprets as pod health state.Disable Flag File: This is a simple but effective way to bypass probe failures during maintenance, avoiding unnecessary pod restarts.
Interaction with Other Parts of the System
Ethereum Node Daemon: The script queries the locally running Ethereum node's RPC endpoint (
localhost:8545). The node must expose this endpoint inside the container.Kubernetes: Used as the liveness probe command in pod specs to monitor Ethereum daemon health.
Persistent Volume (Data Directory): The
/datadirectory where the disable flag and.block_numberfile reside is typically backed by a persistent volume or emptyDir volume shared across probe executions.jqUtility: The script usesjqfor JSON parsing, so the container image must includejq.Curl: Used to perform HTTP POST requests to the Ethereum node RPC.
Mermaid Diagram: Flowchart of Script Logic
flowchart TD
Start[Start Script]
CheckDisable{Disable flag file\n(/data/disable_liveness) exists?}
FetchBlock[Fetch current block number\nvia eth_blockNumber RPC]
ReadFile{Does /data/.block_number exist?}
WriteFile[Write current block number to /data/.block_number]
ReadPrev[Read previous block number from file]
CompareBlocks{Is current > previous?}
ExitSuccess[Print "daemon is running"\nExit 0 (healthy)]
ExitFailFirstRun[Exit 1 (first run - not ready)]
ExitFailStalled[Print "daemon is stalled"\nExit 1 (stalled)]
Start --> CheckDisable
CheckDisable -- Yes --> DisableExit[Print "liveness probe disabled"\nExit 0]
CheckDisable -- No --> FetchBlock
FetchBlock --> ReadFile
ReadFile -- No --> WriteFile --> ExitFailFirstRun
ReadFile -- Yes --> ReadPrev
ReadPrev --> CompareBlocks
CompareBlocks -- Yes --> WriteFile --> ExitSuccess
CompareBlocks -- No --> WriteFile --> ExitFailStalled
Summary
The `liveness.sh` script is a concise, efficient liveness probe implementation tailored for Ethereum nodes in Kubernetes environments. By tracking block number progression through the Ethereum JSON-RPC, it provides a reliable indicator of node health and synchronization progress. It integrates seamlessly with Kubernetes pod lifecycle management to improve overall system stability by automatically detecting and recovering from node stalls.
If you need further assistance integrating or extending this script, or adapting it for other blockchain nodes, please let me know!