Home >Java >javaTutorial >Kafka Consumer – Committing consumer group offset
This guide explores Kafka consumer group offsets, crucial for tracking message consumption progress. Each consumer group maintains an offset for each partition it consumes, indicating the last processed record. This ensures that consumers resume from the correct position after restarts.
A consumer group offset is a simple numerical identifier that tracks the position of a consumer within a Kafka topic's partition. Each partition has a sequential offset for every record. The consumer group uses these offsets to remember where it left off. For instance, a consumer group reading from a two-partition topic (P1 and P2) will have separate offsets for each, representing the last read record in P1 and P2 respectively.
Offset storage can be handled in two ways: within Kafka itself or in an external system (database or file). This article focuses on Kafka's internal offset storage mechanism.
Kafka stores offsets in a special internal topic named __consumer_offsets
. The Kafka client library handles offset storage and retrieval, enabling consumers to seamlessly resume from their last known position after a restart.
If no offset is found for a consumer, the auto.offset.reset
configuration determines the consumer's behavior:
latest
(default): The consumer starts from the end of the topic, ignoring existing messages.earliest
: The consumer starts from the beginning of the topic, processing all available messages.none
: An exception is thrown if no offset is found.Auto-commit simplifies offset management by periodically committing offsets to Kafka. This occurs automatically every 5 seconds by default (controlled by enable.auto.commit
). While convenient, it risks data loss.
Because auto-commit operates in a separate thread, it doesn't track in-flight record processing. If a consumer polls multiple records and auto-commits before processing is complete, data loss can occur upon failure.
Manual commit offers precise control. By disabling auto-commit (enable.auto.commit=false
), you explicitly commit offsets using commitSync()
or commitAsync()
after successfully processing records. This prevents data loss.
<code class="language-java">while (true) { records = consumer.poll(timeout); // process records consumer.commitSync(); // or consumer.commitAsync() }</code>
Auto-commit is suitable if your application:
Otherwise, manual commit is recommended.
Manual commit offers synchronous (commitSync()
) and asynchronous (commitAsync()
) options. commitSync()
blocks until the commit is confirmed, ensuring persistence but impacting performance. commitAsync()
is non-blocking but requires handling potential exceptions.
Consumer group offsets are fundamental for reliable Kafka consumption. While auto-commit simplifies things, manual commit provides greater control and data safety. The choice between synchronous and asynchronous commits depends on your application's needs, balancing performance and reliability. Understanding these mechanisms is key to building robust and fault-tolerant Kafka applications.
Consider exploring a free Kafka mini-course available at Coding Harbour.
Photo credit: @kencheungphoto
The above is the detailed content of Kafka Consumer – Committing consumer group offset. For more information, please follow other related articles on the PHP Chinese website!