Aggregation format¶
The KPL aggregation format is the only wire format that matters: KCL
consumers and the AWS-published deaggregation libraries all expect it.
aiokpl produces it byte-exact so existing consumers deaggregate
transparently.
The canonical specification lives in the C++ KPL repository at
aggregation-format.md.
Everything below is a summary of how aiokpl implements it.
Byte layout¶
0 4 N N+15
+---+---+---+---+==================+---+...+---+
| MAGIC NUMBER | PROTOBUF MESSAGE | MD5 |
+---+---+---+---+==================+---+...+---+
- Magic number (4 bytes):
0xF3 0x89 0x9A 0xC2. Lets a KCL consumer detect an aggregated record on sight. - Protobuf message: the serialized
AggregatedRecord(see schema below). Uses proto2. - MD5 footer (16 bytes): MD5 of the serialized protobuf bytes. Used for corruption detection only — not for security.
Schema¶
message AggregatedRecord {
repeated string partition_key_table = 1;
repeated string explicit_hash_key_table = 2;
repeated Record records = 3;
}
message Record {
required uint64 partition_key_index = 1;
optional uint64 explicit_hash_key_index = 2;
required bytes data = 3;
repeated Tag tags = 4;
}
message Tag {
required string key = 1;
optional string value = 2;
}
The schema is frozen by AWS. aiokpl does not generate code from a
.proto file — it hand-rolls a ~150-line encoder / decoder for these
seven fields in pure Python, with no protobuf runtime dependency. See
aiokpl.aggregation.
What we depend on¶
- Single-record short-circuit. A batch of exactly one record is sent
un-aggregated — raw user bytes with the original partition key. This
matches
KinesisRecord::serializein the C++ KPL. KCL consumers handle both cases transparently. - Dedup tables. When two records in the same aggregate share a
partition key (or explicit hash key),
aiokplemits the key once in the table and references its index from both records. This is a packing optimization the format allows but does not require. - MD5 footer is corruption detection, not security. We construct the
MD5 with
usedforsecurity=Falseso FIPS-restricted runtimes do not refuse to compute it.
What we do not depend on¶
- Tags. The schema reserves field 4 of
Recordfor tags. KPL and KCL have never implemented them.aiokplcan encode and decode tags losslessly, but no production path emits them. - Wire ordering of the tables. The protobuf format does not constrain
field ordering between repeated entries.
aiokplemits allpartition_key_tableentries first, then allexplicit_hash_key_tableentries, then all records — matching the C++ KPL byte-for-byte for conformance testing.
When aggregated¶
When a batch is aggregated, the API-level partition key submitted to
PutRecords is "a" and ExplicitHashKey is set to a value inside the
predicted shard's hash range (aiokpl uses the first record's hash key).
This is how the producer steers an aggregated blob at a specific shard
even though its API-level partition key is meaningless.
See aiokpl.aggregation for the
encoder, decoder, and UserRecord / DecodedRecord dataclasses.