cache.rs
Overview
This file implements a specialized caching mechanism designed to optimize JSON deserialization performance by efficiently reusing Python string objects (`PyStr`) representing JSON keys. Given that JSON keys often repeat within objects or nested structures, this cache reduces overhead caused by frequent Python string allocation and reference counting.
The core of the file defines:
A wrapper struct
CachedKeythat safely manages Python string references (PyStr), including incrementing and decrementing Python reference counts.A fixed-size associative cache
KeyMapthat maps 64-bit hash keys toCachedKeyinstances, using efficient direct-mapped hashing and round-robin replacement policies.A globally accessible, lazily initialized cache instance
KEY_MAPwrapped in aOnceCellfor safe single-threaded mutable access.
This caching layer is integral to the deserialization subsystem, providing a fast lookup and reuse mechanism for Python strings representing JSON keys, substantially improving throughput and memory efficiency.
Detailed Explanation of Entities
CachedKey Struct
#[repr(transparent)]
pub(crate) struct CachedKey {
ptr: PyStr,
}
Purpose
`CachedKey` acts as a safe wrapper around a `PyStr` Python string pointer, managing its Python reference count explicitly to ensure that cached keys remain valid and properly owned.
Safety Traits
Marked
SendandSyncviaunsafe implto allow safe sharing across threads, assuming that the underlying Python string pointer and its runtime environment are correctly managed elsewhere.
Methods
new(ptr: PyStr) -> CachedKeyCreates a new
CachedKeywrapping the givenPyStr.Parameters:
ptr: APyStrrepresenting a Python string object.
Returns: A new instance of
CachedKey.
**Example:**
let key = CachedKey::new(py_str_obj);get(&mut self) -> PyStrRetrieves the stored
PyStrand increments its Python reference count (Py_INCREF), ensuring the caller owns a valid reference to the Python string.Parameters:
&mut self: mutable reference to theCachedKey.
Returns: The cached
PyStrwith an incremented reference count.Behavior:
Asserts that the reference count is at least 1 (debug build).
Calls Python C API
Py_INCREFon the underlying pointer.
**Example:**
let cached_pystr = cached_key.get(); // cached_pystr now has an increased reference count
Drop Implementation
When a
CachedKeyinstance is dropped, it decrements the Python reference count (Py_DECREF) on the underlying Python string pointer to avoid memory leaks.
impl Drop for CachedKey {
fn drop(&mut self) {
ffi!(Py_DECREF(self.ptr.as_ptr().cast::<pyo3_ffi::PyObject>()));
}
}
This ensures that cached Python strings are properly released when evicted from the cache or when the cache is destroyed.
KeyMap Type Alias
pub(crate) type KeyMap =
AssociativeCache<u64, CachedKey, Capacity2048, HashDirectMapped, RoundRobinReplacement>;
Purpose
`KeyMap` is a fixed-capacity associative cache mapping 64-bit hash keys (`u64`) to `CachedKey` instances.
Implementation Details
Uses the
AssociativeCachegeneric with parameters:Key type:
u64— usually a hash of the JSON key string.Value type:
CachedKey— the cached Python string wrapper.Capacity: Fixed at 2048 entries (
Capacity2048).Hashing:
HashDirectMapped— a direct-mapped hash strategy for fast lookups.Replacement policy:
RoundRobinReplacement— evicts entries in a round-robin fashion to evenly distribute cache replacement.
This design offers a balance between lookup speed and memory footprint, suitable for high-frequency key reuse during deserialization.
KEY_MAP Static Instance
pub(crate) static mut KEY_MAP: OnceCell<KeyMap> = OnceCell::new();
Purpose
A global, lazily initialized, mutable cache instance.
Uses
once_cell::unsync::OnceCellto allow one-time initialization without synchronization primitives.Marked
unsafestatic mutable to allow mutation within controlled single-threaded contexts (or where synchronization is externally guaranteed).
Usage
Before usage, `KEY_MAP` should be initialized exactly once. Subsequent accesses can retrieve the cache for lookups or insertions.
Example (hypothetical usage pattern):
unsafe {
if KEY_MAP.get().is_none() {
KEY_MAP.set(KeyMap::new());
}
let cache = KEY_MAP.get_mut().unwrap();
// Use cache for lookups and insertions
}
Important Implementation Details & Algorithms
Reference Counting Management:
TheCachedKeystruct explicitly manages Python reference counts using the Python C API callsPy_INCREFandPy_DECREFon the underlying Python string pointer (PyStr). This is critical to prevent use-after-free errors or leaks in the Python interpreter memory.AssociativeCache Usage:
The cache is implemented using the genericAssociativeCachefrom theassociative_cachecrate, which is configured with a direct-mapped hash and round-robin replacement to provide O(1) average lookup and balanced eviction policy.Cache Capacity:
Fixed at 2048 entries for a compromise between memory usage and cache hit rate, which is suitable for typical JSON deserialization workloads.Unsafe Global Mutability:
The global cache is declared asstatic mutand wrapped in anunsync::OnceCell. This approach avoids synchronization overhead but requires strict discipline from users to ensure single-threaded or externally synchronized access.
Interaction with Other Components
JSON Deserialization (
src/deserialize):
This cache is used internally by the deserialization logic to reuse Python string keys encountered repeatedly in JSON objects. This significantly reduces overhead from creating new Python string objects and managing their lifecycles.Python Memory Management:
Works in tandem with the project’s custom Python memory allocator to maintain consistent memory allocation and deallocation practices aligned with Python's runtime.Python FFI Layer:
Relies on Python C API calls (viaffi!macro) to correctly manage reference counts on Python string objects.Associative Cache Infrastructure:
Built upon theassociative_cachecrate, sharing cache design and replacement strategies with other caching systems in the project.
Usage Example
// Assuming a PyStr 'py_key' representing a Python string key
// Create a CachedKey wrapper
let mut cached_key = CachedKey::new(py_key);
// Retrieve the cached PyStr with incremented reference count
let pystr_for_use = cached_key.get();
// Use pystr_for_use safely in Python FFI calls or Rust-Python interop
// When cached_key goes out of scope, Py_DECREF is called automatically
Mermaid Class Diagram
classDiagram
class CachedKey {
-ptr: PyStr
+new(ptr: PyStr) CachedKey
+get() PyStr
<<Drop>>
}
class AssociativeCache~K, V, Capacity, Hashing, Replacement~ {
+insert(k: K, v: V)
+get_mut(k: &K) Option<&mut V>
+new() Self
}
CachedKey ..> PyStr : contains
KeyMap "type alias" <|-- AssociativeCache~u64, CachedKey, Capacity2048, HashDirectMapped, RoundRobinReplacement~
Summary
Purpose: Provides a performant caching layer to reuse Python string keys during JSON deserialization.
Core Types:
CachedKeymanages Python string references.KeyMapis a fixed-size associative cache storing keys.Memory Safety: Carefully manages Python reference counts to ensure safe sharing and lifetime management.
Cache Strategy: Uses direct-mapped hashing with round-robin eviction for fast lookups and balanced replacement.
Global Access:
KEY_MAPprovides a lazily initialized global cache accessible throughout the deserialization subsystem.System Integration: Enhances deserialization speed and memory efficiency by reducing redundant Python string allocations, working closely with Python's FFI and memory allocator.
This file is a critical performance optimization component for JSON deserialization in Python-Rust integration contexts.