High-Performance JSON Parsing

Overview

This module provides the core JSON parsing and writing functionalities implemented primarily in Rust, tightly integrated with an embedded ultra-fast C library called **yyjson**. The core goal is to achieve blazing-fast, memory-safe, and highly compliant JSON parsing and serialization. This module solves the problem of efficiently converting JSON byte streams into structured representations and back, with support for extensive JSON standards and edge cases.

Purpose and Problem Addressed

JSON parsing is a fundamental operation for many applications, but traditional JSON parsers often suffer from performance bottlenecks and insufficient handling of edge cases. This module addresses the need for:

Core Components and Workflows

Embedded yyjson C Library

At the heart of parsing and writing, the **yyjson** C library (`include/yyjson/yyjson.c` and `yyjson.h`) provides:

The `yyjson_read_opts()` function is the main entry point for parsing JSON data from a byte buffer with configurable allocator and options.

Rust Deserialization Backend

The Rust backend (`src/deserialize/backend/mod.rs`) conditionally selects between Rust-native JSON parsing and the embedded `yyjson` parser based on features enabled during compilation. When the `yyjson` feature is enabled, it exposes the `deserialize` functionality implemented on top of the embedded `yyjson` C library.

#[cfg(feature = "yyjson")]
mod yyjson;

#[cfg(feature = "yyjson")]
pub(crate) use yyjson::deserialize;

This design allows the Rust layer to leverage the high-performance C parsing engine while seamlessly integrating with Rust's memory safety and ownership models.

Key Functionalities and Workflows

JSON Parsing Process

  1. Input Data Preparation: JSON data is copied into a buffer padded with extra bytes to prevent buffer overruns during parsing.

  2. Parsing Entry Point: The yyjson_read_opts() function receives the data buffer, length, allocator, and read options.

  3. Skipping Whitespace and Comments: The parser efficiently skips spaces and optionally C-style comments if enabled.

  4. Finite State Machine Parsing: Using goto-based FSM, the parser recognizes JSON tokens, including literals (true, false, null), numbers, strings, arrays, and objects.

  5. Number Parsing: The parser reads numbers with a custom algorithm supporting 64-bit integers and IEEE-754 doubles, handling rounding, large numbers, and special floating-point values with high accuracy and speed.

  6. String Parsing: Strings are parsed with UTF-8 validation, handling escape sequences and surrogate pairs. Invalid Unicode can be optionally accepted.

  7. Memory Management: Parsed JSON values and strings are stored in a contiguous memory pool managed by custom allocators to reduce fragmentation and overhead.

  8. Building JSON Document: The parser constructs an immutable JSON document structure (yyjson_doc) with references to parsed values (yyjson_val).

JSON Writing Process

The module also supports serialization of JSON values into UTF-8 encoded JSON strings with options for pretty-printing, Unicode escaping, and handling special floating-point values.

Interaction with Other System Parts

Important Concepts and Design Patterns

Code Illustrations

Conditional Backend Selection in Rust

#[cfg(not(feature = "yyjson"))]
mod json;

#[cfg(feature = "yyjson")]
mod yyjson;

#[cfg(feature = "yyjson")]
pub(crate) use yyjson::deserialize;

#[cfg(not(feature = "yyjson"))]
pub(crate) use json::deserialize;

This snippet shows how the project conditionally uses the embedded `yyjson` parser for deserialization.

JSON Parsing Main Entry Function (C)

Excerpt from `yyjson_read_opts()` in `yyjson.c`:

yyjson_doc *yyjson_read_opts(char *dat,
                             usize len,
                             const yyjson_alc *alc_ptr,
                             yyjson_read_err *err) {
    // ...
    hdr = (u8 *)alc.malloc(alc.ctx, len + YYJSON_PADDING_SIZE);
    memcpy(hdr, dat, len);
    memset(end, 0, YYJSON_PADDING_SIZE);

    if (likely(char_is_container(*cur))) {
        if (char_is_space(cur[1]) && char_is_space(cur[2])) {
            doc = read_root_pretty(hdr, cur, end, alc, err);
        } else {
            doc = read_root_minify(hdr, cur, end, alc, err);
        }
    } else {
        doc = read_root_single(hdr, cur, end, alc, err);
    }
    // ...
    return doc;
}

This illustrates how the parser chooses different parsing strategies based on input content and formatting.

Finite State Machine Parsing (Excerpt)

arr_val_begin:
    if (*cur == '{') {
        cur++;
        goto obj_begin;
    }
    if (*cur == '[') {
        cur++;
        goto arr_begin;
    }
    if (char_is_number(*cur)) {
        val_incr();
        ctn_len++;
        if (likely(read_number(&cur, val, &msg))) goto arr_val_end;
        goto fail_number;
    }
    // ... other branches for strings, literals, whitespace, errors

This highlights the FSM approach with explicit `goto` targets for parsing arrays and objects.

Mermaid Diagram: JSON Parsing Flow

flowchart TD
    Start[Start Parsing]
    Allocate[Allocate Padded Buffer]
    SkipWS[Skip Whitespace & Comments]
    DetectRoot{Root Token}
    ParseObj[Parse Object]
    ParseArr[Parse Array]
    ParseVal[Parse Single Value]
    BuildDoc[Build JSON Document]
    Finish[Return Document]

    Start --> Allocate --> SkipWS --> DetectRoot
    DetectRoot -->|{ Object| ParseObj
    DetectRoot -->|[ Array| ParseArr
    DetectRoot -->|Other| ParseVal
    ParseObj --> BuildDoc
    ParseArr --> BuildDoc
    ParseVal --> BuildDoc
    BuildDoc --> Finish

This diagram visualizes the high-level flow of JSON parsing in the embedded `yyjson` library, demonstrating the initial allocation, token detection, parsing branches, and final document construction.


This documentation page details the core concepts, workflows, and interactions of the High-Performance JSON Parsing module, emphasizing the embedded `yyjson` C library integration and the Rust backend's use of it for blazing-fast, memory-safe JSON parsing and serialization.