Health and Readiness Probes

Overview

The Health and Readiness Probes module comprises a collection of shell scripts designed to monitor the health status and synchronization state of blockchain node daemons and indexer services within their respective coinstacks. These probes are essential for Kubernetes to determine whether a containerized service is ready to receive traffic (readiness) or needs to be restarted due to failure or unresponsiveness (liveness).

This module solves critical operational problems by:

Core Concepts

How the Module Works

Each supported blockchain coinstack includes one or more shell scripts under the `daemon/` or [indexer/](/projects/291/68864) directories, such as `readiness.sh`, `liveness.sh`, and optionally `startup.sh`. These scripts are executed by Kubernetes probes at configured intervals.

Readiness Probe Workflow

  1. Disable Check: The probe first checks for the presence of a file (e.g., /data/disable_readiness) that temporarily disables readiness checks for maintenance or debugging.

  2. Node Status Query: The script queries the local node daemon via RPC or REST endpoints to retrieve:

    • Current block height.

    • Network's latest block height.

    • Peer count.

    • Syncing status.

  3. Synchronization Validation: The node's block height is compared to the network's latest block height, allowing a configurable tolerance (e.g., 1 to 25 blocks depending on chain).

  4. Peer Connectivity Validation: The node must have at least one peer connected to ensure network participation.

  5. Reference Node Cross-Check (EVM Chains): For Ethereum and EVM-compatible chains, the probe fetches block heights from multiple external reference nodes to validate the local node's sync status.

  6. Exit Codes: The script exits with 0 if the node is ready, otherwise exits with 1 to indicate the node is not ready.

For example, the Bitcoin readiness probe queries RPC methods `getblockchaininfo` and `getconnectioncount` to check sync status and peer count:

CONNECTION_COUNT=$(curl -sf -H 'content-type: application/json' -u user:password -d '{ "jsonrpc": "2.0", "id": "probe", "method": "getconnectioncount", "params": [] }' http://localhost:8332)
BLOCKCHAIN_INFO=$(curl -sf -H 'content-type: application/json' -u user:password -d '{ "jsonrpc": "2.0", "id": "probe", "method": "getblockchaininfo", "params": [] }' http://localhost:8332)

These values determine if the node is synced and has peers to be considered ready.

Handling Different Blockchains

Each script adapts these checks to the blockchain node's available RPC or REST API interface and chain-specific characteristics such as block time and typical sync tolerance.

Interaction with Other System Components

Design Patterns and Unique Approaches

Example Probe Script Snippet: Ethereum Daemon Readiness Check

source /evm.sh

BLOCK_HEIGHT_TOLERANCE=5

ETH_SYNCING=$(curl -sf -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' -H 'Content-Type: application/json' http://localhost:8545) || exit 1

SYNCING=$(echo $ETH_SYNCING | jq -r '.result')

if [[ $SYNCING == false ]]; then
  eth_blockNumber=$(curl -sf -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' -H 'Content-Type: application/json' http://localhost:8545) || exit 1
  current_block_number_hex=$(echo $eth_blockNumber | jq -r '.result')
  current_block_number=$(($current_block_number_hex))

  best_reference_block_number=$(get_best_reference_block_number https://ethereum.publicnode.com https://eth-mainnet.g.alchemy.com/v2/demo https://rpc.ankr.com/eth)

  reference_validation daemon $current_block_number $best_reference_block_number $BLOCK_HEIGHT_TOLERANCE

  echo "daemon is synced"
  exit 0
fi

echo "daemon is still syncing"
exit 1

This snippet illustrates the process of checking sync state, querying block height, obtaining the best reference block height, and performing validation before signaling readiness.

Module Structure and Relevant Files

Mermaid Diagram - Probe Workflow in Kubernetes

flowchart TD
  StartProbe[Kubernetes Probe Triggered] --> CheckDisableFile{Disable File Present?}
  CheckDisableFile -- Yes --> ProbeDisabled[Exit 0: Probe Disabled]
  CheckDisableFile -- No --> QueryNodeStatus[Query Node RPC/REST APIs]
  QueryNodeStatus --> ParseSyncStatus[Parse Sync & Peer Info]
  ParseSyncStatus --> CheckSync{Is Node Synced Within Tolerance?}
  CheckSync -- No --> NotReady[Exit 1: Node Syncing]
  CheckSync -- Yes --> CheckPeers{Has Minimum Peers Connected?}
  CheckPeers -- No --> NotReadyPeers[Exit 1: No Peers]
  CheckPeers -- Yes --> ReferenceCheck{EVM Chain?}
  ReferenceCheck -- Yes --> QueryReferenceNodes[Fetch Reference Block Heights]
  QueryReferenceNodes --> ValidateReference[Validate Local vs Reference Heights]
  ValidateReference -- Fail --> NotReadyRef[Exit 1: Reference Validation Failed]
  ValidateReference -- Pass --> Ready[Exit 0: Node Ready]
  ReferenceCheck -- No --> Ready

This flowchart visualizes the decision-making process of a readiness probe, including optional reference node validation for EVM chains, peer connectivity checks, and synchronization status verification.


This documentation details the purpose, design, and operation of the Health and Readiness Probes module, emphasizing its critical role in maintaining the reliability and stability of blockchain node services within the system.