Memory and Resource Management
This module is dedicated to efficient memory management and caching strategies that support the high-performance JSON serialization and deserialization functionality in the project. It primarily addresses two critical concerns:
Providing a custom global memory allocator that integrates seamlessly with Python's memory management APIs.
Implementing a caching mechanism for keys during JSON deserialization to optimize repeated string lookups and reduce overhead.
Python Memory Allocator
Purpose and Rationale
The project replaces the default global memory allocator with a custom allocator that delegates all memory operations to Python's memory APIs (`PyMem_Malloc`, `PyMem_Free`, etc.). This ensures that memory allocations performed internally by Rust components remain compatible with Python's memory management, avoiding potential conflicts or fragmentation issues.
This design is crucial because the library integrates tightly with Python objects and the Python interpreter, which expect memory to be allocated and freed using Python's allocator for consistency, safety, and proper tracking.
How It Works
The allocator is implemented as a Rust struct `PyMemAllocator` that implements the [GlobalAlloc](/projects/287/67784) trait, which allows it to act as the global allocator for the Rust code in this project.
Key functions overridden include:
alloc: Allocates memory usingPyMem_Malloc.dealloc: Frees memory usingPyMem_Free.alloc_zeroed: Allocates zero-initialized memory by callingPyMem_Mallocand manually zeroing the memory.realloc: Reallocates memory usingPyMem_Realloc.
Example snippet demonstrating the allocator forwarding allocation calls to Python:
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
pyo3_ffi::PyMem_Malloc(layout.size()).cast::<u8>()
}
This global allocator is registered via the `#[global_allocator]` attribute on a static instance of `PyMemAllocator`, ensuring all Rust-side allocations go through Python's memory APIs.
Interaction with the System
Ensures memory consistency between Rust and Python environments.
Prevents dual allocators from causing memory corruption or leaks.
Supports safe memory sharing when Rust code creates or manipulates Python objects or buffers.
Deserialization Key Caching
Purpose and Rationale
During JSON deserialization, string keys often repeat, especially in JSON objects with many nested or repeated field names. Repeatedly creating Python string objects for these keys is expensive due to allocation, reference counting, and hash computation costs.
To optimize this, the module implements a specialized caching mechanism for deserialization keys. It stores and reuses Python string objects (`PyStr`) representing keys to minimize overhead and improve parsing speed.
How It Works
The key caching is implemented using a fixed-capacity associative cache (`AssociativeCache`) with direct-mapped hashing and round-robin replacement policies to balance speed and memory footprint.
CachedKeywraps aPyStrPython string pointer, managing reference counts safely.The cache (
KeyMap) maps from a 64-bit hash toCachedKeyobjects.When a key is encountered during deserialization, the cache is checked:
If present, the cached
PyStris reused, increasing its reference count.If absent, a new
PyStris created and inserted into the cache.
The cache capacity is 2048 entries, balancing memory use and hit rate.
Example illustrating safe reference counting in `CachedKey`:
impl CachedKey {
pub fn get(&mut self) -> PyStr {
let ptr = self.ptr.as_ptr();
debug_assert!(ffi!(Py_REFCNT(ptr)) >= 1);
ffi!(Py_INCREF(ptr));
self.ptr
}
}
impl Drop for CachedKey {
fn drop(&mut self) {
ffi!(Py_DECREF(self.ptr.as_ptr().cast::<pyo3_ffi::PyObject>()));
}
}
The cache itself is stored in a `OnceCell`, allowing lazy, safe initialization and global mutable access without synchronization overhead.
Interaction with the System
Used internally by deserialization logic in
src/deserializeto optimize key string handling.Reduces Python object creation overhead during parsing.
Improves throughput and memory efficiency for large or deeply nested JSON objects.
Works closely with Python's reference counting semantics to maintain memory safety.
Module Interaction Overview
This module acts as a foundational layer underpinning efficient memory use and speed:
The Custom Allocator (
src/alloc.rs) ensures all Rust memory management cooperates with Python's allocator, critical for all Rust components including serialization and deserialization.The Deserialization Key Cache (
src/deserialize/cache.rs) is specifically leveraged by the deserializer to reuse Python string objects, improving performance on repeated keys.Both components are integral to the Rust core's performance and correctness when called from Python via the FFI layer (
src/ffi).They indirectly support benchmark and test modules by providing a stable, efficient memory and caching foundation.
Mermaid Diagram: Memory and Resource Management Flow
This flowchart illustrates how memory allocation and key caching integrate with the deserialization process:
flowchart TD
Start[Start Deserialization]
AllocateMem[Allocate Memory via PyMemAllocator]
ParseJSON[Parse JSON Tokens]
KeyDetected{Is Token a Key?}
CheckCache[Check Key in Cache (KeyMap)]
CacheHit[Cache Hit: Reuse PyStr Key]
CacheMiss[Cache Miss: Create PyStr Key & Insert]
UseKey[Use Cached or New PyStr Key]
ContinueParsing[Continue Parsing]
Finish[Finish Deserialization]
Start --> AllocateMem --> ParseJSON --> KeyDetected
KeyDetected -->|Yes| CheckCache
CheckCache -->|Hit| CacheHit --> UseKey
CheckCache -->|Miss| CacheMiss --> UseKey
UseKey --> ContinueParsing --> KeyDetected
KeyDetected -->|No| ContinueParsing
ContinueParsing --> Finish
This documentation clarifies how the Memory and Resource Management module ensures efficient and safe memory usage aligned with Python's runtime while accelerating deserialization through strategic caching of string keys.