Prevent Fetcher from wrongfully discarding PartitionRecords in compacted topics #33
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a topic is compacted, consecutive messages might not have
consecutive offsets. Fetcher._append works would discard
PartitionRecords whenever the offset of first message of the part was
not equal to the offset of the last message of the previous part + 1,
this is almost never the case for compacted topics (at least when
fetching from the 'earliest' offset).
By using
part.fetch_offset
instead, we ensure the wholePartitionRecords is not discarded the first time offsets are not
consecutive, avoiding sending "useless" new FetchRequests.
In our case, the first FetchResponse returned ~13,000 records, using
consumer.poll(max_records=50)
, ~12,950 were discarded because theoffset of the 51st message was not equal to the offset of the 50th
message + 1 and a new FetchRequest was sent, and so on.... With this
change, the whole ~13,000 messages were correctly used only one
FetchRequest had to be sent.
(The topic was __consumer_offsets which is compacted).