liveness.sh

Overview

`liveness.sh` is a Bash script designed to serve as a Kubernetes **liveness probe** for the `op-node` (Optimism Node) service. Its main purpose is to monitor whether the node is actively processing blockchain data by verifying that both Layer 1 (L1) and Layer 2 (L2) block numbers are progressing over time. If the node appears stalled (i.e., block numbers are not increasing), the script will exit with a failure status, prompting Kubernetes to restart the container.

The script also supports a mechanism to **disable the liveness check** by the presence of a specific file, allowing manual override when needed.


Detailed Explanation

Script Workflow

  1. Disable Check
    The script first checks if the file /data/disable_liveness exists. If present, it outputs "liveness probe disabled" and exits successfully (exit 0), effectively skipping the liveness check.

  2. Fetch Sync Status
    It sends a JSON-RPC request to the local Optimism node endpoint (http://localhost:9545) to retrieve the current synchronization status:

    • Calls method optimism_syncStatus.

    • Expects JSON response containing L1 and L2 block numbers.

  3. Parse Block Numbers
    Using jq, the script extracts:

    • current_l1.number — the current Layer 1 block number.

    • unsafe_l2.number — the current Layer 2 block number.

  4. Store and Compare Block Numbers

    • Stores the current block numbers in /data/.block_number as a JSON object.

    • If the file does not exist (first run), it creates the file and exits with code 1 to signal a failure (forcing a restart or re-check).

    • On subsequent runs, it reads the previous block numbers and compares:

      • Both L1 and L2 block numbers must be strictly increasing compared to the previous values.

      • If yes, it prints "op-node is running" and exits 0 (success).

      • Otherwise, prints "op-node is stalled" and exits 1 (failure).


Key Implementation Details


Usage Examples

Typical Usage in Kubernetes

This script is intended to be run periodically by Kubernetes as a liveness probe:

livenessProbe:
  exec:
    command:
    - /bin/bash
    - /path/to/liveness.sh
  initialDelaySeconds: 30
  periodSeconds: 10

This setup will invoke the script every 10 seconds after an initial delay of 30 seconds. If the script exits with a non-zero code, Kubernetes restarts the pod.

Manual Disable

To temporarily disable the liveness probe without removing or modifying Kubernetes settings, create the disable file inside the container’s `/data` directory:

touch /data/disable_liveness

The next probe run will exit successfully without checking sync status.


Interaction with Other System Components


Visual Diagram

The following flowchart illustrates the decision flow and key functional steps within `liveness.sh`:

flowchart TD
    Start([Start]) --> CheckDisable{File /data/disable_liveness exists?}
    CheckDisable -- Yes --> Disabled[Print "liveness probe disabled"\nExit 0]
    CheckDisable -- No --> FetchStatus[Fetch sync status via JSON-RPC]
    FetchStatus --> ParseBlocks[Parse current L1 and L2 block numbers]
    ParseBlocks --> CheckFile{File /data/.block_number exists?}
    CheckFile -- No --> CreateFile[Write current block numbers to file\nExit 1]
    CheckFile -- Yes --> ReadPrev[Read previous block numbers from file]
    ReadPrev --> CompareBlocks{Current L1 > Previous L1\nAND Current L2 > Previous L2?}
    CompareBlocks -- Yes --> Healthy[Print "op-node is running"\nExit 0]
    CompareBlocks -- No --> Stalled[Print "op-node is stalled"\nExit 1]
    CreateFile --> End([End])
    Disabled --> End
    Healthy --> End
    Stalled --> End

Summary

`liveness.sh` is a simple yet effective monitoring script that:

This script plays a critical role in maintaining the health and availability of the `op-node` service within a containerized environment.