Project Overview
Project Purpose and Objectives
This project is the Apache Camel Kafka component, designed to integrate Apache Kafka messaging into the Apache Camel integration framework. Its primary goal is to provide a robust, configurable, and performant Kafka producer and consumer support within Camel routes, enabling seamless event-driven and streaming architectures.
Key objectives include:
Reliable Kafka Message Production and Consumption: Support for asynchronous and synchronous Kafka producers with transactional capabilities, and consumers with flexible offset management including manual and automatic commits.
Advanced Offset Management: Implement multiple commit managers (sync, async, noop, offset repository backed) and manual commit mechanisms for fine-grained control of message acknowledgment and fault tolerance.
Idempotent Consumption: Provide Kafka topic-backed idempotent repositories to support deduplication and exactly-once processing semantics in distributed environments.
Resume Support: Integrate resume strategies for Kafka consumers that maintain offset state in Kafka topics, enabling fault-tolerant and resumable consumption with caching and asynchronous offset publishing.
Flexible Topic Subscription: Support both static topic subscriptions and dynamic topic resolution, including regex pattern-based subscriptions and pluggable subscribe adapters.
Header Serialization and Deserialization: Provide customizable Kafka header serializers and deserializers, with default implementations and support for custom logic, e.g., JMS interoperability and string conversions.
Health Checks and Dev Console Integration: Integrate health checks for Kafka producers and consumers, and provide metrics collection and monitoring via a Camel development console.
Transformation Utilities: Offer processors for JSON field manipulation (insert, drop, replace, mask, hoist, extract), and Kafka message routing based on timestamps or regex.
Extensive Testing: Comprehensive unit and integration tests covering producer, consumer, batching, transactions, authentication, error handling, resume strategies, and idempotent consumption.
Implementation Details:
The core Kafka component classes (
KafkaComponent,KafkaEndpoint) manage configuration, client factory injection, SSL parameters, and endpoint creation.Kafka producers (
KafkaProducer) support async and sync sending, transactional sending, and header propagation.Kafka consumers (
KafkaConsumer) manage fetch tasks (KafkaFetchRecords) for polling messages, error handling, pause/resume, commit managers, and offset repositories.Commit managers (
CommitManagerinterface and implementations) abstract offset commit strategies including asynchronous, synchronous, noop, and offset repository backed commits.Manual commit support (
KafkaManualCommitinterface and factories) allows explicit offset commit control within Camel routes.Resume strategies (e.g.,
SingleNodeKafkaResumeStrategy) publish and consume offset state in Kafka topics, with cache and adapter layers for offset management.Idempotent repository (
KafkaIdempotentRepository) uses a Kafka topic to store message IDs for deduplication with a local cache synchronized via Kafka.Header serializers/deserializers enable conversion between Kafka header bytes and Java types, with default and custom implementations.
Consumer rebalance listeners implement partition assignment logic, offset seeking, and integration with resume strategies.
Error handling strategies (
PollExceptionStrategyimplementations) handle Kafka poll exceptions with options to discard, retry, reconnect, stop, or bridge to Camel error handler.Transform processors modify JSON message bodies for routing and masking use cases.
Example Workflows and Use Cases
1. Kafka Message Production
A Camel route sends messages to a Kafka endpoint.
The
KafkaProducercreates KafkaProducerRecords from message bodies and headers.Messages are sent asynchronously or synchronously based on configuration.
Header filtering and serialization occurs to propagate Camel headers as Kafka headers.
Transactional sending can be enabled, with Kafka transactions started and committed around message sends.
Metadata from Kafka (
RecordMetadata) can be propagated back to Camel messages for routing decisions or auditing.
2. Kafka Message Consumption with Manual Commit
A Kafka consumer endpoint is configured with manual commit enabled.
The
KafkaConsumerstarts one or moreKafkaFetchRecordsthreads which poll Kafka for messages.Fetched messages are processed in batches or streaming mode.
For each message, a manual commit object (
KafkaManualCommit) is attached to the Camel exchange.The route processes messages and explicitly invokes the
ManualCommitprocessor to commit offsets.Commit managers handle offset caching and persistence to Kafka or offset repositories.
Error handling strategies govern behavior on polling exceptions, including retries or stopping.
3. Kafka Consumer with Resume Strategy
A resume strategy like
SingleNodeKafkaResumeStrategysubscribes to a Kafka topic holding offset data.Upon startup, offsets are loaded asynchronously into a cache.
The Kafka consumer uses a rebalance listener (
ResumeRebalanceListener) to seek partitions to the cached offsets.Offsets are updated asynchronously by publishing offset records to the resume topic.
This enables reliable recovery and resumption of message consumption after failures.
4. Idempotent Consumer
The
KafkaIdempotentRepositoryuses a Kafka topic to store message IDs.The repository maintains a local LRU cache synchronized via Kafka topic consumption.
A Camel route applies an idempotent consumer with this repository to filter duplicate messages.
Incoming messages with IDs already present in the repository are filtered out.
The repository supports eager and non-eager modes and integrates with Camel transactions.
5. Dynamic Topic Routing with Timestamp or Regex
Processors like TimestampRouter or RegexRouter modify Kafka topic headers dynamically based on message timestamps or regex patterns.
This allows routing messages to topics named or formatted according to message content or metadata.
Stack and Technologies
Java: Core language for all component development.
Apache Camel: Integration framework, providing routing and messaging abstractions.
Apache Kafka: Distributed streaming platform for message publishing and consumption.
Jackson: JSON processing library used in data transformations.
SLF4J & Log4j2: Logging framework for observability.
JUnit 5 & Mockito: Testing framework and mocking for unit and integration tests.
Resilience4j: Used in some tests for circuit breaker pattern implementation.
Maven: Build and dependency management.
Kafka Clients: Official Kafka client libraries for producer and consumer.
Spring JDBC & H2: Used in integration tests for database transaction scenarios.
Awaitility: Utility for asynchronous testing and waiting.
Testcontainers: Used for containerized Kafka instances in integration tests.
Why These Technologies:
Apache Kafka is the messaging backbone, and Apache Camel provides the integration and routing framework, making them natural choices.
Jackson is widely used for JSON processing, essential for transformations.
SLF4J with Log4j2 ensures flexible and performant logging.
JUnit 5 and Mockito provide modern testing capabilities ensuring code quality.
Resilience4j offers robust circuit breaker support for consumer resiliency.
Maven standardizes builds and dependency management.
The Kafka client APIs are required to interact with the Kafka brokers.
Spring JDBC and H2 facilitate embedded database testing for transactional scenarios.
Awaitility and Testcontainers support reliable and isolated integration tests.
High-Level Architecture
The project is structured into several key modules:
Component and Endpoint Layer (
KafkaComponent,KafkaEndpoint): Manage Kafka client configuration, endpoint creation, SSL context, and client factories.Producer Layer (
KafkaProducer): Handles message production with support for transactions, header propagation, and asynchronous sending.Consumer Layer (
KafkaConsumer,KafkaFetchRecords): Manages message fetching threads, consumer lifecycle, health checks, pause/resume, and processing via configured processors.Commit Management (commit managers and manual commit classes): Abstracts offset commit logic, supporting synchronous, asynchronous, no-op, and repository-backed commits.
Resume Support (
SingleNodeKafkaResumeStrategy,KafkaResumeAdapter): Handles offset persistence and resumption via Kafka topics with caching and synchronization.Idempotent Repository (
KafkaIdempotentRepository): Provides deduplication by caching message IDs synchronized via Kafka topic consumption.Error Handling (
PollExceptionStrategyimplementations): Defines strategies for handling poll exceptions including discard, retry, reconnect, stop, and bridging to Camel error handler.Transformations (various JSON field processors and routers): Process and route messages based on content, timestamps, or regex patterns.
Serialization/Deserialization (header serializers/deserializers): Handle Kafka header conversion to/from byte arrays with custom and default implementations.
Dev Console and Health Checks: Provide metrics, monitoring, and health reporting for Kafka consumers and producers.
Component Interaction Diagram
graph TB
subgraph Camel Integration
FE[Frontend Routes] --> PE[KafkaProducer]
CE[KafkaConsumer] --> BE[Backend Processing]
PE -->|Produce Records| KafkaBroker[(Kafka Broker)]
KafkaBroker -->|Consume Records| CE
CE --> CommitMgr[CommitManager]
CommitMgr --> OffsetRepo[State Repository]
CE --> ResumeStrat[Resume Strategy]
ResumeStrat --> KafkaBroker
CE --> IdempotentRepo[KafkaIdempotentRepository]
FE --> Transform[Transform Processors]
Transform --> PE
Transform --> CE
end
Frontend routes produce messages via
KafkaProducerto Kafka brokers.KafkaConsumerconsumes messages from Kafka brokers, processes them, and manages offsets via commit managers.Commit managers may persist offsets in external state repositories.
Resume strategies interact with Kafka brokers to store and retrieve offsets enabling consumer resumption.
Idempotent repository ensures message deduplication using Kafka topic.
Transform processors are used to manipulate messages in routes before producing or after consuming.
Developer Navigation
Frontend Developers Start Here
Explore
KafkaProducer.javafor producing messages.Inspect
producer/supportpackage for callback handling, header propagation, and serialization utilities.Review transformation classes in
transformpackage for message manipulation (e.g.,InsertField,DropField,MaskField).Use integration tests under
src/test/java/org/apache/camel/component/kafka/integrationto understand route examples.
Backend Developers Focus on These Commands
KafkaConsumer.javaandKafkaFetchRecords.javamanage consumer lifecycle and polling loop.Commit managers in
consumerpackage (SyncCommitManager,AsyncCommitManager,NoopCommitManager) control offset commits.Manual commit support via
KafkaManualCommitinterface and its factories.Resume logic in
processor/resume/kafkapackage (SingleNodeKafkaResumeStrategy,KafkaResumeAdapter).Error handling strategies under
consumer/errorhandlerpackage (DiscardErrorStrategy,RetryErrorStrategy,ReconnectErrorStrategy).Idempotent repository in
processor/idempotent/kafka/KafkaIdempotentRepository.java.
Testing and Validation
Unit tests in
src/test/java/org/apache/camel/component/kafka.Integration tests in
src/test/java/org/apache/camel/component/kafka/integrationcover scenarios like batching, transactions, authentication, pause/resume, and idempotency.Mock interceptors (
MockProducerInterceptor,MockConsumerInterceptor) assist in testing message flows.
Configuration and Build
Component configuration managed in
KafkaConfiguration.java.Kafka client factories in
KafkaClientFactoryand default implementation inDefaultKafkaClientFactory.Component and endpoint lifecycle in
KafkaComponent.javaandKafkaEndpoint.java.
Visual Diagrams
Kafka Component High-Level Architecture
graph TB
FE[Apache Camel Routes] --> KP[KafkaProducer]
KP --> Kafka[Kakfa Broker]
Kafka --> KC[KafkaConsumer]
KC --> Processor[Message Processor]
KC --> CommitMgr[Commit Manager]
CommitMgr --> OffsetRepo[State Repository]
KC --> ResumeStrategy[Resume Strategy]
ResumeStrategy --> Kafka
KC --> IdempotentRepo[Idempotent Repository]
Transform[Transform Processors] --> FE
Transform --> KP
Transform --> KC
Kafka Consumer Message Processing Workflow
flowchart TD
Start[Start Kafka Consumer] --> Poll[Poll Kafka Broker]
Poll --> Records[Receive ConsumerRecords]
Records --> Process[Process Records]
Process -->|Success| Commit[Commit Offsets]
Process -->|Error| ErrorHandler[Handle Error]
Commit --> Poll
ErrorHandler --> Decision{Break on Error?}
Decision -- Yes --> Stop[Stop Consumer]
Decision -- No --> Poll
This overview provides a concise yet thorough roadmap to the Apache Camel Kafka component, its architecture, workflows, and developer entry points. It enables contributors to quickly grasp the system's purpose, core modules, and how to effectively navigate and extend the project.