Reconnect Strategy

Purpose

The Reconnect Strategy addresses Kafka consumer polling errors by attempting to recover from transient issues through reconnection. Unlike simpler error handling approaches that may discard messages or stop the consumer immediately, this strategy enables the consumer to re-establish connection and retry consuming messages after recoverable failures. However, it explicitly detects non-recoverable authentication errors and halts reconnection attempts in those cases to avoid futile retries and resource waste.

This strategy fits scenarios where temporary Kafka broker or network problems occur, and a consumer reconnect can restore normal operation without manual intervention.

Functionality

At its core, the Reconnect Strategy monitors exceptions thrown during Kafka consumer poll operations and decides whether to continue consuming or to stop. Its key behaviors include:

Critical Code Snippet

@Override
public void handle(long partitionLastOffset, Exception exception) {
    if (exception instanceof AuthenticationException) {
        LOG.warn("Kafka reported a non-recoverable authentication error. The client will not reconnect");
        recordFetcher.setReconnect(false);
        recordFetcher.setConnected(false);
    } else {
        LOG.warn("Requesting the consumer to re-connect on the next run based on polling exception strategy");
        recordFetcher.setReconnect(true);
        recordFetcher.setConnected(false);
    }
    retry = false;
}

This snippet highlights how the strategy distinguishes authentication failures and instructs the consumer to reconnect or not accordingly.

Integration

The Reconnect Strategy plugs into the Kafka consumer’s error handling mechanism by implementing the `PollExceptionStrategy` interface. It works closely with the `KafkaFetchRecords` class, which manages Kafka consumer polling threads and lifecycle. Upon poll exceptions, `KafkaFetchRecords` delegates handling to the configured `PollExceptionStrategy` instance (in this case, the Reconnect Strategy).

Because it cleanly distinguishes non-recoverable authentication errors, it prevents wasted retries that would never succeed, enhancing the robustness of the consumer.

Diagram

sequenceDiagram
    participant Consumer as KafkaFetchRecords
    participant Strategy as ReconnectErrorStrategy
    participant Kafka as Kafka Broker

    Consumer->>Kafka: poll()
    Kafka-->>Consumer: throws Exception
    Consumer->>Strategy: handle(exception)
    alt AuthenticationException (Non-recoverable)
        Strategy->>Consumer: setReconnect(false)
        Strategy->>Consumer: setConnected(false)
        Strategy->>Consumer: set retry=false
    else Other Exceptions (Recoverable)
        Strategy->>Consumer: setReconnect(true)
        Strategy->>Consumer: setConnected(false)
        Strategy->>Consumer: set retry=false
    end
    Consumer->>Consumer: close consumer
    Consumer->>Consumer: on next poll cycle, reconnect if enabled

This sequence diagram illustrates how the Reconnect Strategy reacts to polling exceptions by toggling reconnection and consumer state flags, influencing the consumer’s polling lifecycle.


By selectively triggering consumer reconnections, the Reconnect Strategy enhances Kafka consumer resilience in the face of transient errors while avoiding futile retries on authentication failures. It integrates seamlessly with `KafkaFetchRecords` to manage consumer lifecycle transitions, providing a robust error recovery mechanism within the larger Kafka consumer error handling framework.