Memory and Resource Management

This module is dedicated to efficient memory management and caching strategies that support the high-performance JSON serialization and deserialization functionality in the project. It primarily addresses two critical concerns:


Python Memory Allocator

Purpose and Rationale

The project replaces the default global memory allocator with a custom allocator that delegates all memory operations to Python's memory APIs (`PyMem_Malloc`, `PyMem_Free`, etc.). This ensures that memory allocations performed internally by Rust components remain compatible with Python's memory management, avoiding potential conflicts or fragmentation issues.

This design is crucial because the library integrates tightly with Python objects and the Python interpreter, which expect memory to be allocated and freed using Python's allocator for consistency, safety, and proper tracking.

How It Works

The allocator is implemented as a Rust struct `PyMemAllocator` that implements the [GlobalAlloc](/projects/287/67784) trait, which allows it to act as the global allocator for the Rust code in this project.

Key functions overridden include:

Example snippet demonstrating the allocator forwarding allocation calls to Python:

unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
    pyo3_ffi::PyMem_Malloc(layout.size()).cast::<u8>()
}

This global allocator is registered via the `#[global_allocator]` attribute on a static instance of `PyMemAllocator`, ensuring all Rust-side allocations go through Python's memory APIs.

Interaction with the System


Deserialization Key Caching

Purpose and Rationale

During JSON deserialization, string keys often repeat, especially in JSON objects with many nested or repeated field names. Repeatedly creating Python string objects for these keys is expensive due to allocation, reference counting, and hash computation costs.

To optimize this, the module implements a specialized caching mechanism for deserialization keys. It stores and reuses Python string objects (`PyStr`) representing keys to minimize overhead and improve parsing speed.

How It Works

The key caching is implemented using a fixed-capacity associative cache (`AssociativeCache`) with direct-mapped hashing and round-robin replacement policies to balance speed and memory footprint.

Example illustrating safe reference counting in `CachedKey`:

impl CachedKey {
    pub fn get(&mut self) -> PyStr {
        let ptr = self.ptr.as_ptr();
        debug_assert!(ffi!(Py_REFCNT(ptr)) >= 1);
        ffi!(Py_INCREF(ptr));
        self.ptr
    }
}

impl Drop for CachedKey {
    fn drop(&mut self) {
        ffi!(Py_DECREF(self.ptr.as_ptr().cast::<pyo3_ffi::PyObject>()));
    }
}

The cache itself is stored in a `OnceCell`, allowing lazy, safe initialization and global mutable access without synchronization overhead.

Interaction with the System


Module Interaction Overview

This module acts as a foundational layer underpinning efficient memory use and speed:


Mermaid Diagram: Memory and Resource Management Flow

This flowchart illustrates how memory allocation and key caching integrate with the deserialization process:

flowchart TD
    Start[Start Deserialization]
    AllocateMem[Allocate Memory via PyMemAllocator]
    ParseJSON[Parse JSON Tokens]
    KeyDetected{Is Token a Key?}
    CheckCache[Check Key in Cache (KeyMap)]
    CacheHit[Cache Hit: Reuse PyStr Key]
    CacheMiss[Cache Miss: Create PyStr Key & Insert]
    UseKey[Use Cached or New PyStr Key]
    ContinueParsing[Continue Parsing]
    Finish[Finish Deserialization]

    Start --> AllocateMem --> ParseJSON --> KeyDetected
    KeyDetected -->|Yes| CheckCache
    CheckCache -->|Hit| CacheHit --> UseKey
    CheckCache -->|Miss| CacheMiss --> UseKey
    UseKey --> ContinueParsing --> KeyDetected
    KeyDetected -->|No| ContinueParsing
    ContinueParsing --> Finish

This documentation clarifies how the Memory and Resource Management module ensures efficient and safe memory usage aligned with Python's runtime while accelerating deserialization through strategic caching of string keys.