metrics.rs
Overview
This file defines and implements comprehensive metrics collection and reporting facilities related to block production, networking, routing, and asynchronous task execution within the node. It primarily focuses on the BlockProductionMetrics struct, which encapsulates detailed performance and operational metrics for block production activities. The metrics are built using the OpenTelemetry API and include counters, gauges, histograms, and up-down counters to capture a wide variety of statistics such as timing, queue sizes, event counts, and error rates.
The file also declares several constants representing channel names and Aerospike object types used elsewhere in the system for telemetry tagging and metrics correlation. Additionally, it defines the top-level Metrics struct aggregating metrics from multiple subsystems (NetMetrics, BlockProductionMetrics, RoutingMetrics, and TokioMetrics), providing a centralized entry point for metrics instrumentation across the node.
Structs and Their Functionality
BlockProductionMetrics
A cloneable wrapper around an Arc to an internal struct BlockProductionMetricsInner that holds the actual OpenTelemetry metric instruments. This struct provides methods to report various block production-related metrics.
Internal Structure: BlockProductionMetricsInner
Contains fields for all the metric instruments related to block production, including:
Gauges for thread load, queue sizes, sequence numbers, etc.
Histograms for measuring durations and latency of block production, application, finalization, and other internal processes.
Counters for events such as blocks finalized, transactions aborted, forks counted, and error occurrences.
UpDownCounters for tracking thread counts and channel lengths.
These metric instruments are created and configured in the BlockProductionMetrics::new constructor using the OpenTelemetry Meter.
Key Methods
new(meter: &Meter) -> SelfConstructs a new
BlockProductionMetricsinstance, initializing all metrics with appropriate boundaries and labels.report_block_production_time_and_correction(production_time: u128, correction_time: i64, thread_id: &ThreadIdentifier)Records the block production time and correction with thread identification. It guards against abnormally large production times, logging a warning if the value is out of bounds.
report_block_apply_time(value: u64, thread_id: &ThreadIdentifier)Records the time taken to apply a block.
report_generate_merkle_update_time(value: u64, thread_id: &ThreadIdentifier)Records the duration for generating Merkle tree updates.
report_finalization(seq_no: u32, tx_count: usize, thread_id: &ThreadIdentifier)Updates metrics upon block finalization, including the count of finalized blocks and transactions.
report_finalization_time(duration_ms: u64, thread_id: &ThreadIdentifier)Records the time spent in block finalization.
report_tx_aborted(thread_id: &ThreadIdentifier)Increments the count of aborted transactions.
report_ext_tx_aborted(thread_id: &ThreadIdentifier)Increments the count of externally aborted transactions.
report_ext_msg_queue_size(value: usize, thread_id: &ThreadIdentifier)Records the size of the external message queue.
report_int_msg_queue_size(value: usize, thread_id: &ThreadIdentifier)Records the size of the internal message queue.
report_thread_count()Increments the thread count metric.
report_thread_load(value: usize, thread_id: &ThreadIdentifier)Records the load on a particular thread.
report_finalization_gap(value: u32, thread_id: &ThreadIdentifier)Records the gap in finalization sequence numbers.
report_memento_duration(value: u128, thread_id: &ThreadIdentifier)Records the duration of memento operations, guarding against out-of-bounds values.
report_load_from_archive_invoke(thread_id: &ThreadIdentifier)Increments the count of load-from-archive invocations.
report_load_from_archive_apply(thread_id: &ThreadIdentifier)Increments the count of load-from-archive applications.
report_block_received_attestation_sent(value: u64, thread_id: &ThreadIdentifier)Records histogram data for time between block reception and attestation sending.
report_child_parent_attestation(value: u64, thread_id: &ThreadIdentifier)Records timing data between child and parent attestations.
report_forks_count(thread_id: &ThreadIdentifier)Increments fork count metric.
report_parent_first_attestation_none(thread_id: &ThreadIdentifier)Increments count for cases where the parent's first attestation is missing.
report_resend(thread_id: &ThreadIdentifier)Increments resend count metric.
report_query_gaps(thread_id: &ThreadIdentifier)Increments count of query gaps detected.
report_blocks_requested(value: u64, thread_id: &ThreadIdentifier)Adds to the total number of blocks requested.
report_unfinalized_blocks_queue(value: u64, thread_id: &ThreadIdentifier)Records current size of the unfinalized blocks queue.
report_finalized_block_attestations_cnt(value: u64, thread_id: &ThreadIdentifier)Records the count of finalized block attestations.
report_bk_set_size(value: u64, thread_id: &ThreadIdentifier)Records size of the block key set.
report_bk_set(bk_set: usize, future_bk_set: usize, thread_id: &ThreadIdentifier)Records current and future block key set sizes.
report_store_block_on_disk(value: u64, thread_id: &ThreadIdentifier)Records histogram for the time taken to store a block on disk.
report_verify_all_block_signatures(value: u64, thread_id: &ThreadIdentifier)Records the duration of block signature verification.
report_calc_consencus_params(value: u64, thread_id: &ThreadIdentifier)Records time taken to calculate consensus parameters.
report_check_cross_thread_ref_data(check_ms: u64, wait_ms: u64, thread_id: &ThreadIdentifier)Records histogram values for checking and waiting on cross-thread reference data.
report_apply_block_total(value: u64, thread_id: &ThreadIdentifier)Records total time applying a block.
report_common_block_checks(value: u64, thread_id: &ThreadIdentifier)Records time spent on common block verification checks.
report_processing_delay(value: u64, thread_id: &ThreadIdentifier)Records delays in processing.
report_attestation_after_apply_delay(value: u64, thread_id: &ThreadIdentifier)Records delay between block application and attestation.
report_attn_target_descendant_generations(value: usize, thread_id: &ThreadIdentifier)Records histogram for generations of attestation target descendants.
report_aerospike_messages_write_busy(value: u64)Increments counter for Aerospike messages write busy occurrences.
report_internal_message_queue_length(value: u64)Records length of internal message queue.
report_aerospike_write(value: f64, object_type: &'static str)Records Aerospike write durations tagged by object type.
report_aerospike_read(value: f64, object_type: &'static str)Records Aerospike read durations tagged by object type.
report_aerospike_write_err(object_type: &'static str)Increments Aerospike write error count per object type.
report_aerospike_read_err(object_type: &'static str)Increments Aerospike read error count per object type.
report_outbound_accounts(value: u64, thread_id: &ThreadIdentifier)Adds to the count of outbound accounts.
report_saved_state(thread_id: &ThreadIdentifier)Increments saved states counter.
report_broadcast_join(thread_id: &ThreadIdentifier)Increments broadcast join count.
report_sync_time_spent(value: u64, thread_id: &ThreadIdentifier)Adds time spent on synchronization.
report_sync_error(thread_id: &ThreadIdentifier)Increments synchronization error count.
report_last_prefinalized_seqno(value: u64, thread_id: &ThreadIdentifier)Records last pre-finalized sequence number.
report_next_round_block_height(value: u64, thread_id: &ThreadIdentifier)Records the next round's block height.
report_authority_switch_direct_resent(thread_id: &ThreadIdentifier)Increments count for direct resent authority switches.
Increments count of state requests.
report_error(kind: &'static str)Increments error counter with a kind label for categorization.
Trait Implementations
Implements InstrumentedChannelMetrics trait with report_channel method to update channel length metrics by delta for a given channel.
Implements XInstrumentedChannelMetrics trait with enhanced report_channel method supporting a string label tag along with the channel name.
Metrics
This struct aggregates metrics from different subsystems:
net: NetMetrics— Network-related metrics imported from the network::metrics module.node: BlockProductionMetrics— Block production metrics defined in this file.routing: RoutingMetrics— Routing metrics from the http_server::metrics module.tokio: TokioMetrics— Metrics related to the Tokio runtime from telemetry_utils.
Methods
new(meter: &Meter) -> SelfInitializes all sub-metrics structs, passing the shared
Meterinstance to each.
Constants
Channel Names
Used as identifiers/tags for telemetry related to inter-component communication channels:
Aerospike Object Types
Used to tag Aerospike-related metrics with specific object types:
Important Implementation Details and Algorithms
The file leverages the
opentelemetrycrate for metrics instrumentation, utilizing various metric instrument types such asCounter,Gauge,Histogram, andUpDownCounter.Histogram instruments are initialized with specific boundaries tailored to the expected ranges of the measured values. This enables fine-grained latency and duration tracking.
To prevent skewing metrics with unrealistic values, the file uses the
out_of_bounds_guard!macro to filter out-of-range data points before recording them.Thread identification is consistently attached to metrics via the thread_id_attr helper function, which creates an OpenTelemetry KeyValue with the thread's string label. This supports per-thread telemetry aggregation and analysis.
The file supports feature-gated metrics, such as accounts_number, which is only recorded if the monitor-accounts-number feature is enabled.
Implements InstrumentedChannelMetrics and XInstrumentedChannelMetrics traits to enable channel length monitoring, with optional tagging for extended metadata.
Interactions with Other Parts of the System
The
BlockProductionMetricsstruct interacts with thread identifiers (ThreadIdentifiertype) defined elsewhere in the crate, allowing metrics to be tagged by thread.The
Metricsstruct aggregates metrics from networking (NetMetrics), routing (RoutingMetrics), and Tokio runtime metrics (TokioMetrics), enabling unified metrics reporting across the node subsystems.The constants for channel names and Aerospike object types serve as standardized labels for metrics emitted by other modules handling inter-thread communication and Aerospike database interactions.
The
out_of_bounds_guard!macro and metrics types are imported from the telemetry_utils crate, indicating integration with common telemetry utilities shared across the project.The file depends on OpenTelemetry's metrics API for metric instrument creation and recording.
Usage Examples
use opentelemetry::metrics::Meter;
use crate::metrics::{Metrics, BlockProductionMetrics};
use crate::types::ThreadIdentifier;
fn example_usage(meter: &Meter, thread_id: ThreadIdentifier) {
// Initialize metrics
let metrics = Metrics::new(meter);
// Report block production time with correction
metrics.node.report_block_production_time_and_correction(250, -5, &thread_id);
// Report a finalized block with transaction count
metrics.node.report_finalization(1000, 15, &thread_id);
// Increment aborted transaction count
metrics.node.report_tx_aborted(&thread_id);
// Report Aerospike write duration for internal messages
metrics.node.report_aerospike_write(1200.5, "int_messages");
// Record internal message queue length
metrics.node.report_internal_message_queue_length(42);
}
Visual Diagram
classDiagram
class Metrics {
+net: NetMetrics
+node: BlockProductionMetrics
+routing: RoutingMetrics
+tokio: TokioMetrics
+new(meter)
}
class BlockProductionMetrics {
+new(meter)
+report_block_production_time_and_correction()
+report_block_apply_time()
+report_finalization()
+report_thread_count()
+report_error()
...
}
class BlockProductionMetricsInner {
-thread_load: Gauge
-block_production_time: Histogram
-block_apply_time: Histogram
-finalization_time: Histogram
-block_finalized: Counter
-tx_finalized: Counter
-tx_aborted: Counter
-ext_tx_aborted: Counter
-thread_count: UpDownCounter
-... (many more metrics fields)
}
Metrics "1" *-- "1" BlockProductionMetrics
BlockProductionMetrics "1" *-- "1" BlockProductionMetricsInner
Helper Functions
thread_id_attr(thread_id: &ThreadIdentifier) -> KeyValueConstructs a telemetry key-value attribute representing the thread identifier, used for tagging metrics with thread context.
This file is central to detailed operational telemetry for the block production subsystem and integrates with the system-wide metrics collection framework, facilitating performance monitoring, troubleshooting, and analytics. It aligns with instrumentation practices described in Telemetry and Monitoring and interacts closely with network and routing metrics as in Network Metrics and Routing Metrics.