Embedded yyjson Integration

Purpose

This subtopic addresses the integration and customization of the **yyjson** C library within the core JSON parsing and writing system. While the parent topic covers the overall JSON parsing and serialization implemented in Rust, this submodule specifically embeds and adapts the **yyjson** library to achieve ultra-fast, memory-safe JSON parsing. It solves critical problems such as efficient memory allocation, recursion depth limits, UTF-8 validation, and handling non-standard JSON features, which are essential for robust and high-performance parsing.

Functionality

The embedded **yyjson** integration provides:

Key Workflow

When parsing JSON, the embedded **yyjson** parser:

  1. Allocates or receives an allocator context to manage memory for JSON values and strings.

  2. Copies input data into a padded buffer for safe reading and in-situ modification if enabled.

  3. Skips initial whitespace or comments if allowed.

  4. Uses an FSM to parse JSON tokens (objects, arrays, strings, numbers, literals) recursively, respecting recursion depth limits.

  5. For numbers, performs efficient parsing using fast IEEE-754 paths or raw string fallback as configured.

  6. For strings, validates UTF-8 and properly handles escape sequences.

  7. Builds an internal immutable or mutable JSON document representation with memory managed by the allocator.

  8. On serialization, writes JSON back to UTF-8 with configurable escaping, indentation, and options for non-standard features.

Critical Data Flow Example: Number Parsing

static_inline bool read_number(u8 **ptr,
                               yyjson_val *val,
                               const char **msg) {
    // Parses number efficiently with fast and slow paths.
    // Sets val->tag with subtype (UINT, SINT, REAL).
}

Relationship to Parent Topic and Other Subtopics

Mermaid Diagram: Core Parsing Flow in Embedded yyjson

flowchart TD
    Start[Start Parsing] --> Allocator[Setup Allocator]
    Allocator --> CopyInput[Copy & Pad Input Buffer]
    CopyInput --> SkipSpace[Skip Spaces & Comments (if allowed)]
    SkipSpace --> TokenParse{Next Token}
    TokenParse -->|Object Start '{'| ObjBegin[Begin Object]
    TokenParse -->|Array Start '['| ArrBegin[Begin Array]
    TokenParse -->|String '"'| ReadString[Read String (UTF-8 & Escapes)]
    TokenParse -->|Number| ReadNumber[Read Number (Fast/Bigint)]
    TokenParse -->|Literal| ReadLiteral[Read true/false/null/NaN/Inf]
    TokenParse -->|End Document| End[Parsing Complete]
    ObjBegin --> ObjKey[Parse Object Key]
    ObjKey --> ObjVal[Parse Object Value]
    ObjVal --> TokenParse
    ArrBegin --> ArrVal[Parse Array Value]
    ArrVal --> TokenParse
    ReadString --> TokenParse
    ReadNumber --> TokenParse
    ReadLiteral --> TokenParse
    TokenParse -->|Error| Error[Error Handling & Cleanup]
    Error --> End

This diagram highlights the core FSM-driven process of JSON parsing within the embedded **yyjson** integration, showing allocator setup, input preparation, token parsing, recursive container handling, and error management.


The **Embedded yyjson Integration** serves as the high-performance, low-level JSON engine enabling the broader project to achieve its goals of speed, correctness, and robustness. Its careful memory management and parsing optimizations are central to the overall system's efficiency and reliability.