Embedded yyjson Integration

Purpose

This subtopic addresses the integration and customization of the **yyjson** C library within the core JSON parsing and writing system. While the parent topic covers the overall JSON parsing and serialization implemented in Rust, this submodule specifically embeds and adapts the **yyjson** library to achieve ultra-fast, memory-safe JSON parsing. It solves critical problems such as efficient memory allocation, recursion depth limits, UTF-8 validation, and handling non-standard JSON features, which are essential for robust and high-performance parsing.

Functionality

The embedded **yyjson** integration provides:

Core JSON Parsing & Writing: Implements JSON reading and writing using a finely optimized finite state machine (FSM) with goto statements for state transitions, ensuring minimal overhead.
Memory Allocation Layers: Supports multiple allocator strategies, including default libc-based, pool allocator for pre-allocated fixed-size memory, and dynamic allocator which manages memory chunks with freelists for reuse.
Recursion Limits: Protects against stack overflow by limiting container (array/object) recursion depth (YYJSON_READER_CONTAINER_RECURSION_LIMIT), preventing malformed or deeply nested JSON from crashing the parser.
Fast Floating-Point Conversion: Implements efficient IEEE-754 double reading and writing with fast paths and fallback to exact bigint-based methods to maintain correct rounding, improving performance on numeric parsing and serialization.
UTF-8 and Escape Handling: Validates UTF-8 sequences and properly decodes and encodes escaped characters within JSON strings, including surrogate pairs and control characters.
Support for Non-Standard JSON: Offers compile-time and run-time options to enable or disable handling of non-standard JSON features such as comments, trailing commas, and Inf/NaN literals.
Utilities: Includes JSON Pointer and Patch APIs for advanced JSON document manipulation.

Key Workflow

When parsing JSON, the embedded **yyjson** parser:

Allocates or receives an allocator context to manage memory for JSON values and strings.
Copies input data into a padded buffer for safe reading and in-situ modification if enabled.
Skips initial whitespace or comments if allowed.
Uses an FSM to parse JSON tokens (objects, arrays, strings, numbers, literals) recursively, respecting recursion depth limits.
For numbers, performs efficient parsing using fast IEEE-754 paths or raw string fallback as configured.
For strings, validates UTF-8 and properly handles escape sequences.
Builds an internal immutable or mutable JSON document representation with memory managed by the allocator.
On serialization, writes JSON back to UTF-8 with configurable escaping, indentation, and options for non-standard features.

Critical Data Flow Example: Number Parsing

The number reader first attempts a fast path for common integers and decimals.
If the number is too large or precise, it falls back to bigint arithmetic to ensure correct rounding.
Special literals like Infinity and NaN are parsed if enabled.
Returns a tagged union representing integer or floating-point number with subtype information.

static_inline bool read_number(u8 **ptr,
                               yyjson_val *val,
                               const char **msg) {
    // Parses number efficiently with fast and slow paths.
    // Sets val->tag with subtype (UINT, SINT, REAL).
}

Relationship to Parent Topic and Other Subtopics

The Embedded yyjson Integration is the foundational C library integration upon which the Rust core JSON parsing and writing logic builds. It provides the low-level parsing engine that the Rust deserialization backend and serialization modules wrap and extend.
It complements the Rust Deserialization Backend by providing the ultra-fast, memory-safe parser implementation that the Rust code calls via FFI.
The subtopic introduces advanced memory allocator strategies (pool and dynamic allocators) that are not directly covered in the parent overview.
It adds nuanced handling of recursion limits, UTF-8 validation, and floating-point rounding correctness, extending beyond what the parent topic broadly describes.
The integration ensures that parsing performance and correctness are maximized, enabling the higher-level Python integration and benchmarking modules to rely on a robust and efficient core.

Mermaid Diagram: Core Parsing Flow in Embedded yyjson

flowchart TD
    Start[Start Parsing] --> Allocator[Setup Allocator]
    Allocator --> CopyInput[Copy & Pad Input Buffer]
    CopyInput --> SkipSpace[Skip Spaces & Comments (if allowed)]
    SkipSpace --> TokenParse{Next Token}
    TokenParse -->|Object Start '{'| ObjBegin[Begin Object]
    TokenParse -->|Array Start '['| ArrBegin[Begin Array]
    TokenParse -->|String '"'| ReadString[Read String (UTF-8 & Escapes)]
    TokenParse -->|Number| ReadNumber[Read Number (Fast/Bigint)]
    TokenParse -->|Literal| ReadLiteral[Read true/false/null/NaN/Inf]
    TokenParse -->|End Document| End[Parsing Complete]
    ObjBegin --> ObjKey[Parse Object Key]
    ObjKey --> ObjVal[Parse Object Value]
    ObjVal --> TokenParse
    ArrBegin --> ArrVal[Parse Array Value]
    ArrVal --> TokenParse
    ReadString --> TokenParse
    ReadNumber --> TokenParse
    ReadLiteral --> TokenParse
    TokenParse -->|Error| Error[Error Handling & Cleanup]
    Error --> End

This diagram highlights the core FSM-driven process of JSON parsing within the embedded **yyjson** integration, showing allocator setup, input preparation, token parsing, recursive container handling, and error management.

The **Embedded yyjson Integration** serves as the high-performance, low-level JSON engine enabling the broader project to achieve its goals of speed, correctness, and robustness. Its careful memory management and parsing optimizations are central to the overall system's efficiency and reliability.