Health and Readiness Probes
Overview
The Health and Readiness Probes module comprises a collection of shell scripts designed to monitor the health status and synchronization state of blockchain node daemons and indexer services within their respective coinstacks. These probes are essential for Kubernetes to determine whether a containerized service is ready to receive traffic (readiness) or needs to be restarted due to failure or unresponsiveness (liveness).
This module solves critical operational problems by:
Ensuring that blockchain nodes are fully synchronized with the network before routing traffic to their API services.
Detecting network connectivity issues such as lack of peers, which may impact node reliability.
Providing a standardized, automated way to check service health across multiple distinct blockchain implementations.
Preventing premature traffic routing to newly started or lagging nodes, thereby enhancing system stability.
Core Concepts
Liveness Probe: Confirms the service is running and responsive. If it fails, Kubernetes restarts the container.
Readiness Probe: Confirms the service is fully operational and ready to accept requests. If it fails, Kubernetes temporarily removes the service from the load balancer.
Startup Probe (optional): Used to verify that an application has started successfully before readiness or liveness probes are activated.
Synchronization Check: Ensures the blockchain node's local chain height is sufficiently close to the network’s latest block height.
Peer Connectivity Check: Confirms the node is connected to a minimum number of peers to maintain network health.
Reference Node Validation: For EVM-based chains, the node's reported block height is cross-validated against multiple external reference nodes to prevent false positives on synchronization status.
How the Module Works
Each supported blockchain coinstack includes one or more shell scripts under the `daemon/` or [indexer/](/projects/291/68864) directories, such as `readiness.sh`, `liveness.sh`, and optionally `startup.sh`. These scripts are executed by Kubernetes probes at configured intervals.
Readiness Probe Workflow
Disable Check: The probe first checks for the presence of a file (e.g.,
/data/disable_readiness) that temporarily disables readiness checks for maintenance or debugging.Node Status Query: The script queries the local node daemon via RPC or REST endpoints to retrieve:
Current block height.
Network's latest block height.
Peer count.
Syncing status.
Synchronization Validation: The node's block height is compared to the network's latest block height, allowing a configurable tolerance (e.g., 1 to 25 blocks depending on chain).
Peer Connectivity Validation: The node must have at least one peer connected to ensure network participation.
Reference Node Cross-Check (EVM Chains): For Ethereum and EVM-compatible chains, the probe fetches block heights from multiple external reference nodes to validate the local node's sync status.
Exit Codes: The script exits with
0if the node is ready, otherwise exits with1to indicate the node is not ready.
For example, the Bitcoin readiness probe queries RPC methods `getblockchaininfo` and `getconnectioncount` to check sync status and peer count:
CONNECTION_COUNT=$(curl -sf -H 'content-type: application/json' -u user:password -d '{ "jsonrpc": "2.0", "id": "probe", "method": "getconnectioncount", "params": [] }' http://localhost:8332)
BLOCKCHAIN_INFO=$(curl -sf -H 'content-type: application/json' -u user:password -d '{ "jsonrpc": "2.0", "id": "probe", "method": "getblockchaininfo", "params": [] }' http://localhost:8332)
These values determine if the node is synced and has peers to be considered ready.
Handling Different Blockchains
UTXO Chains (Bitcoin, Litecoin): Use RPC commands to check block height, headers, and connection count.
EVM Chains (Ethereum, Optimism, Polygon, Arbitrum): Use JSON-RPC calls (
eth_syncing,eth_blockNumber,net_peerCount) and external reference nodes to verify sync.Cosmos-based Chains (Thorchain): Use REST endpoints (
/cosmos/base/tendermint/v1beta1/syncing,/net_info,/status) to determine syncing and peer status.
Each script adapts these checks to the blockchain node's available RPC or REST API interface and chain-specific characteristics such as block time and typical sync tolerance.
Interaction with Other System Components
Kubernetes: These scripts are configured as Kubernetes health probes (readiness and liveness) in the pod spec for daemon and indexer containers. Kubernetes calls these scripts periodically to determine pod health and readiness.
Daemon Containers: The probes run inside the blockchain node daemon containers, directly accessing the node's RPC endpoints.
Indexer Services: Similar health check scripts exist for Blockbook or other indexer services to ensure their readiness.
API Servers: API services depend on the readiness of daemons and indexers; unhealthy nodes prevent routing of API requests, maintaining service reliability.
Reference Nodes: For EVM chains, the probes reach out to external, trusted RPC nodes to validate the local node’s synchronization, adding a layer of verification.
Design Patterns and Unique Approaches
Fail-Fast with Exit Codes: The scripts exit with
0on success and non-zero on failure, leveraging Kubernetes probe semantics.Configurable Tolerance: Each blockchain coinstack defines a block height tolerance reflecting its network's block production rate and acceptable lag.
Cross-Validation: EVM chain probes cross-validate local node block height with multiple external RPC endpoints to avoid false positives.
Disable File Mechanism: The presence of a disable file allows operators to bypass readiness checking for manual interventions without redeploying.
Use of Standard Tools: The scripts use
curlandjqfor JSON RPC calls and parsing, ensuring minimal dependencies and easy debugging.Sourceable Utility Scripts: Some probes source common utility scripts (e.g.,
/evm.shor/tendermint.sh) for shared functions like fetching reference block heights.
Example Probe Script Snippet: Ethereum Daemon Readiness Check
source /evm.sh
BLOCK_HEIGHT_TOLERANCE=5
ETH_SYNCING=$(curl -sf -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' -H 'Content-Type: application/json' http://localhost:8545) || exit 1
SYNCING=$(echo $ETH_SYNCING | jq -r '.result')
if [[ $SYNCING == false ]]; then
eth_blockNumber=$(curl -sf -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' -H 'Content-Type: application/json' http://localhost:8545) || exit 1
current_block_number_hex=$(echo $eth_blockNumber | jq -r '.result')
current_block_number=$(($current_block_number_hex))
best_reference_block_number=$(get_best_reference_block_number https://ethereum.publicnode.com https://eth-mainnet.g.alchemy.com/v2/demo https://rpc.ankr.com/eth)
reference_validation daemon $current_block_number $best_reference_block_number $BLOCK_HEIGHT_TOLERANCE
echo "daemon is synced"
exit 0
fi
echo "daemon is still syncing"
exit 1
This snippet illustrates the process of checking sync state, querying block height, obtaining the best reference block height, and performing validation before signaling readiness.
Module Structure and Relevant Files
Daemon Probes: Located under
node/coinstacks/{coin}/daemon/readiness.shfor each blockchain, these scripts are tailored to the specific RPC interfaces and sync characteristics of each node.Indexer Probes: Similar probe scripts exist for indexer services (not shown in provided files) to ensure the indexers are in sync and responsive.
Common Utility Scripts: Scripts such as
/evm.shand/tendermint.shprovide shared functions for EVM and Cosmos-based chains, respectively. These are sourced by individual readiness scripts.
Mermaid Diagram - Probe Workflow in Kubernetes
flowchart TD
StartProbe[Kubernetes Probe Triggered] --> CheckDisableFile{Disable File Present?}
CheckDisableFile -- Yes --> ProbeDisabled[Exit 0: Probe Disabled]
CheckDisableFile -- No --> QueryNodeStatus[Query Node RPC/REST APIs]
QueryNodeStatus --> ParseSyncStatus[Parse Sync & Peer Info]
ParseSyncStatus --> CheckSync{Is Node Synced Within Tolerance?}
CheckSync -- No --> NotReady[Exit 1: Node Syncing]
CheckSync -- Yes --> CheckPeers{Has Minimum Peers Connected?}
CheckPeers -- No --> NotReadyPeers[Exit 1: No Peers]
CheckPeers -- Yes --> ReferenceCheck{EVM Chain?}
ReferenceCheck -- Yes --> QueryReferenceNodes[Fetch Reference Block Heights]
QueryReferenceNodes --> ValidateReference[Validate Local vs Reference Heights]
ValidateReference -- Fail --> NotReadyRef[Exit 1: Reference Validation Failed]
ValidateReference -- Pass --> Ready[Exit 0: Node Ready]
ReferenceCheck -- No --> Ready
This flowchart visualizes the decision-making process of a readiness probe, including optional reference node validation for EVM chains, peer connectivity checks, and synchronization status verification.
This documentation details the purpose, design, and operation of the Health and Readiness Probes module, emphasizing its critical role in maintaining the reliability and stability of blockchain node services within the system.