split.rs
Overview
This file implements a storage wrapper called SplitValueStore that transparently splits large data values (bins) into smaller chunks before storage, and merges those chunks back into a single logical record on retrieval. This is useful for working with storage backends that impose size limits on individual records or bins.
The splitting and merging logic ensures that large blobs stored as individual bins do not exceed a configurable maximum size (max_size). The file also provides utility functions and an internal helper struct to manage the splitting and chunking process efficiently and correctly.
Key Components
Constant
BIN_SPLIT_COUNT: &str = "__split_count"
A special bin name used to store the number of chunks a split record is divided into.
Struct: SplitValueStore
A generic wrapper around an inner KeyValueStore implementation that automatically splits large bin values into chunks before putting, and merges them back after getting.
Type Parameter:
Inner: KeyValueStore + Clone— the underlying key-value store being wrapped.Fields:
inner: Inner— the wrapped key-value store instance.max_size: usize— the maximum size (in bytes) allowed for each chunk.
Methods:
new(inner: Inner, max_size: usize) -> Self
Constructs a newSplitValueStorefrom an inner store and a maximum chunk size.Implements the
KeyValueStoretrait with the following methods:get(&self, key: &Key, values: &Bins, label: &'static str) -> anyhow::Result<Option<ValueMap>>
Retrieves a record from the store. If the record was split into chunks, fetches all chunks, merges them, and returns the merged record.put(&self, key: &Key, bins: &[Bin], until_success: bool, label: &'static str) -> anyhow::Result<()>
Splits the bins if their total size exceedsmax_size, then stores each chunk as a separate record under chunk keys. Otherwise, stores directly.batch_get(&self, gets: Vec<(Key, Bins)>, label: &'static str) -> anyhow::Result<Vec<Option<ValueMap>>>
Performs batch retrieval of multiple keys, merging chunked records as needed.(Debug only)
db_reads()anddb_writes()proxy to the inner store for monitoring database operations.
Usage Example:
let inner_store = ...; // Some KeyValueStore implementation
let split_store = SplitValueStore::new(inner_store, 1000); // max chunk size 1000 bytes
// Put large bins
split_store.put(&key, &bins, true, "label")?;
// Get and automatically merged
let record = split_store.get(&key, &Bins::Some(vec!["bin1".to_string()]), "label")?;
Internal Helper: Split<'k, 'b>
Manages the process of splitting bins into chunks, grouping chunks into multiple records, and generating the keys/bins for each chunk.
Fields:
max_size: usize— maximum chunk size.key: &Key— base key for the records.records: Vec<(Key, Vec<Bin<'b>>)>— accumulated chunked records ready for storage.bins: Vec<Bin<'b>>— bins currently being collected for one chunk record.size: usize— current size of collected bins.
Methods:
new(key: &Key, max_size: usize) -> anyhow::Result<Self>
Creates a new Split instance for the given key and max chunk size.add_bin(&mut self, bin_size: usize, bin: &Bin<'b>) -> anyhow::Result<()>
Adds a bin to the current split. If the bin is too large, splits it into smaller chunks.flush_record(&mut self) -> anyhow::Result<()>
Writes the current collected bins into a record and resets the buffer.into_records(self) -> anyhow::Result<Vec<(Key, Vec<Bin<'b>>)>>
Finalizes the splitting process, flushes remaining bins, adds a split count bin to the first record, and returns all chunked records.
Functions
split_bins(key: &Key, bins: &[Bin], max_size: usize) -> anyhow::Result<Option<Vec<(Key, Vec<Bin>)>>>
Purpose:
Splits a set of bins into multiple smaller records if their combined size exceedsmax_size.Parameters:
key— the base key for the record.bins— slice of bins to potentially split.max_size— maximum allowed size per chunk.
Returns:
Ok(None)if splitting is not required (total size < max_size).Ok(Some(records))whererecordsis a vector of chunked(Key, Vec<Bin>)pairs.
Functionality:
Measures bin sizes, sorts bins by size, and uses theSplitstruct to perform chunking.
merge_bins(store: &(impl KeyValueStore + ?Sized), key: &Key, values: &Bins, record: &mut ValueMap, label: &'static str) -> anyhow::Result<()>
Purpose:
Merges chunked records back into a single logical record after retrieval.Parameters:
store— the underlying key-value store to fetch chunk records.key— the base key of the original record.values— bins requested by the caller (used to filter retrieved bins).record— mutable reference to the initially fetched record to merge additional chunks into.label— logging/tracing label.
Returns:
Ok(())on success, withrecordcontaining merged bins.
Details:
Checks for the presence of the__split_countbin to determine how many chunks exist. Retrieves each chunk record by generating chunk keys, then concatenates their blob bin values into the main record.
chunk_key(key: &Key, i: usize) -> anyhow::Result<Key>
Purpose:
Generates a chunk key for the i-th chunk of the original key.Parameters:
key— the original record key.i— chunk index (starting at 1).
Returns:
A new
Keywith a modified user key in the format"__{original}+{i}".
Errors:
If the original key lacks a user key.
next_chunk(bin: &Bin, offset: usize, max_size: usize) -> anyhow::Result<(usize, Bin)>
Purpose:
Extracts a chunk of the blob value from a bin starting atoffsetwith length up tomax_size.Parameters:
bin— the bin to split (must be a blob).offset— offset in the blob to start chunking.max_size— maximum length of the chunk.
Returns:
Tuple
(size, Bin)wheresizeis the chunk length andBincontains the chunked blob.
Errors:
If the bin's value is not a blob.
If offset is invalid.
Implementation Details & Algorithms
Size Estimation:
Bin size estimation uses Aerospike'sestimate_size()method on bin values to decide if splitting is necessary.Splitting Strategy:
Bins are sorted by size ascending to pack smaller bins efficiently into chunks. Large bins that exceed the remaining space in a chunk are split usingnext_chunk()into multiple smaller blob bins.Chunk Key Naming:
Chunk keys are derived by appending"__{user_key}+{chunk_index}"to the original key's user key, ensuring unique keys per chunk.Merging Chunks:
The primary record stores a special bin__split_countindicating total chunks. Merge fetches all chunks except the first and concatenates blob bins with the first record's bins.Error Handling:
Usesanyhow::Resultfor error propagation. Invalid cases such as missing user key or invalid split count bin type raise errors.
Interaction with Other System Components
Underlying Storage:
SplitValueStorewraps anyKeyValueStoreimplementation. It delegates actual data persistence and retrieval to the wrapped store while adding chunking logic.Aerospike Integration:
Uses Aerospike types likeKey,Bin,Value, andBinsextensively for key and bin representation.Storage Module:
Interacts with theKeyValueStoretrait andValueMapfrom thestoragemodule, which define the interface and data structures for key-value operations.Testing:
Contains unit tests using an in-memory store (MemStore) from thestorage::memmodule to validate splitting and merging correctness.
Diagram: Structure of split.rs
classDiagram
class SplitValueStore {
-inner: Inner
-max_size: usize
+new(inner, max_size)
+get(key, values, label)
+put(key, bins, until_success, label)
+batch_get(gets, label)
+db_reads()
+db_writes()
}
class Split {
-max_size: usize
-key: &Key
-records: Vec<(Key, Vec<Bin>)>
-bins: Vec<Bin>
-size: usize
+new(key, max_size)
+add_bin(bin_size, bin)
+flush_record()
+into_records()
}
SplitValueStore --> "1" Split : uses
SplitValueStore ..> KeyValueStore : implements
Split --> Bin : manages bins
Split --> Key : manages keys
Additional Notes
The file relies on the Aerospike client library for data types and operations related to keys and bins.
The splitting logic is designed specifically for blob-type bins; other bin types are expected to be small enough or unsupported for splitting.
The batch retrieval method preserves the chunk merging behavior for multiple keys simultaneously.
Debug methods
db_readsanddb_writesallow tracking of underlying database operations when compiled in debug mode.The tests validate both small records (no split) and large records (split into multiple chunks), ensuring correctness of chunking and merging logic.
For further details on key concepts like KeyValueStore trait, Aerospike Key and Bin types, and error handling patterns, see the relevant topics on KeyValueStore Interface, Aerospike Data Types, and Error Handling.