Memory and Resource Management

This module is dedicated to efficient memory management and caching strategies that support the high-performance JSON serialization and deserialization functionality in the project. It primarily addresses two critical concerns:

Providing a custom global memory allocator that integrates seamlessly with Python's memory management APIs.
Implementing a caching mechanism for keys during JSON deserialization to optimize repeated string lookups and reduce overhead.

Python Memory Allocator

Purpose and Rationale

The project replaces the default global memory allocator with a custom allocator that delegates all memory operations to Python's memory APIs (`PyMem_Malloc`, `PyMem_Free`, etc.). This ensures that memory allocations performed internally by Rust components remain compatible with Python's memory management, avoiding potential conflicts or fragmentation issues.

This design is crucial because the library integrates tightly with Python objects and the Python interpreter, which expect memory to be allocated and freed using Python's allocator for consistency, safety, and proper tracking.

How It Works

The allocator is implemented as a Rust struct `PyMemAllocator` that implements the [GlobalAlloc](/projects/287/67784) trait, which allows it to act as the global allocator for the Rust code in this project.

Key functions overridden include:

alloc: Allocates memory using PyMem_Malloc.
dealloc: Frees memory using PyMem_Free.
alloc_zeroed: Allocates zero-initialized memory by calling PyMem_Malloc and manually zeroing the memory.
realloc: Reallocates memory using PyMem_Realloc.

Example snippet demonstrating the allocator forwarding allocation calls to Python:

unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
    pyo3_ffi::PyMem_Malloc(layout.size()).cast::<u8>()
}

This global allocator is registered via the `#[global_allocator]` attribute on a static instance of `PyMemAllocator`, ensuring all Rust-side allocations go through Python's memory APIs.

Interaction with the System

Ensures memory consistency between Rust and Python environments.
Prevents dual allocators from causing memory corruption or leaks.
Supports safe memory sharing when Rust code creates or manipulates Python objects or buffers.

Deserialization Key Caching

Purpose and Rationale

During JSON deserialization, string keys often repeat, especially in JSON objects with many nested or repeated field names. Repeatedly creating Python string objects for these keys is expensive due to allocation, reference counting, and hash computation costs.

To optimize this, the module implements a specialized caching mechanism for deserialization keys. It stores and reuses Python string objects (`PyStr`) representing keys to minimize overhead and improve parsing speed.

How It Works

The key caching is implemented using a fixed-capacity associative cache (`AssociativeCache`) with direct-mapped hashing and round-robin replacement policies to balance speed and memory footprint.

CachedKey wraps a PyStr Python string pointer, managing reference counts safely.
The cache (KeyMap) maps from a 64-bit hash to CachedKey objects.
When a key is encountered during deserialization, the cache is checked:
- If present, the cached PyStr is reused, increasing its reference count.
- If absent, a new PyStr is created and inserted into the cache.
The cache capacity is 2048 entries, balancing memory use and hit rate.

Example illustrating safe reference counting in `CachedKey`:

impl CachedKey {
    pub fn get(&mut self) -> PyStr {
        let ptr = self.ptr.as_ptr();
        debug_assert!(ffi!(Py_REFCNT(ptr)) >= 1);
        ffi!(Py_INCREF(ptr));
        self.ptr
    }
}

impl Drop for CachedKey {
    fn drop(&mut self) {
        ffi!(Py_DECREF(self.ptr.as_ptr().cast::<pyo3_ffi::PyObject>()));
    }
}

The cache itself is stored in a `OnceCell`, allowing lazy, safe initialization and global mutable access without synchronization overhead.

Interaction with the System

Used internally by deserialization logic in src/deserialize to optimize key string handling.
Reduces Python object creation overhead during parsing.
Improves throughput and memory efficiency for large or deeply nested JSON objects.
Works closely with Python's reference counting semantics to maintain memory safety.

Module Interaction Overview

This module acts as a foundational layer underpinning efficient memory use and speed:

The Custom Allocator (src/alloc.rs) ensures all Rust memory management cooperates with Python's allocator, critical for all Rust components including serialization and deserialization.
The Deserialization Key Cache (src/deserialize/cache.rs) is specifically leveraged by the deserializer to reuse Python string objects, improving performance on repeated keys.
Both components are integral to the Rust core's performance and correctness when called from Python via the FFI layer (src/ffi).
They indirectly support benchmark and test modules by providing a stable, efficient memory and caching foundation.

Mermaid Diagram: Memory and Resource Management Flow

This flowchart illustrates how memory allocation and key caching integrate with the deserialization process:

flowchart TD
    Start[Start Deserialization]
    AllocateMem[Allocate Memory via PyMemAllocator]
    ParseJSON[Parse JSON Tokens]
    KeyDetected{Is Token a Key?}
    CheckCache[Check Key in Cache (KeyMap)]
    CacheHit[Cache Hit: Reuse PyStr Key]
    CacheMiss[Cache Miss: Create PyStr Key & Insert]
    UseKey[Use Cached or New PyStr Key]
    ContinueParsing[Continue Parsing]
    Finish[Finish Deserialization]

    Start --> AllocateMem --> ParseJSON --> KeyDetected
    KeyDetected -->|Yes| CheckCache
    CheckCache -->|Hit| CacheHit --> UseKey
    CheckCache -->|Miss| CacheMiss --> UseKey
    UseKey --> ContinueParsing --> KeyDetected
    KeyDetected -->|No| ContinueParsing
    ContinueParsing --> Finish

This documentation clarifies how the Memory and Resource Management module ensures efficient and safe memory usage aligned with Python's runtime while accelerating deserialization through strategic caching of string keys.