yyjson.c
Overview
`yyjson.c` is the primary implementation file of the **yyjson** library, a high-performance, memory-efficient JSON parser and writer implemented in C. This file contains the core functionalities for reading (parsing), writing (serializing), and manipulating JSON data with support for:
Fast and safe parsing of JSON text into an internal immutable or mutable representation.
Precise and efficient number parsing and formatting compliant with IEEE-754 double precision.
UTF-8 string reading and writing with full escape handling and Unicode validation.
Memory management via customizable allocators, including default libc-based, fixed-size pool allocators, and dynamic chunk allocators.
Advanced JSON utilities such as JSON Pointer (RFC 6901), JSON Patch (RFC 6902), and JSON Merge Patch (RFC 7386).
Extensive compile-time options for enabling/disabling features, including non-standard JSON extensions, fast floating-point conversion, and UTF-8 validation.
The implementation emphasizes speed and correctness, using techniques like finite state machines (FSM) with `goto` statements, branch prediction hints, manual memory alignment, and precomputed lookup tables for numeric computations.
Detailed Explanations
Memory Allocators
Default Allocator (YYJSON_DEFAULT_ALC)
A wrapper over the standard `malloc`, `realloc`, and `free` functions, used by default when no custom allocator is provided.
Functions:
default_malloc(void *ctx, usize size)default_realloc(void *ctx, void *ptr, usize old_size, usize size)default_free(void *ctx, void *ptr)
Null Allocator (YYJSON_NULL_ALC)
A placeholder allocator where all allocation functions return `NULL` or do nothing, ensuring no actual memory operations occur.
Useful for internal checks or disabled memory operations.
Pool Allocator
Manages a fixed-size buffer split into chunks (`pool_chunk`) linked in a free list.
Key structures:
pool_chunk: Header with size and next pointer.pool_ctx: Holds total size and free list head.
Functions:
pool_malloc: Allocates aligned chunk from free list.pool_free: Frees chunk back to free list, merging adjacent chunks.pool_realloc: Resizes chunk if possible; else allocates new and copies.
Initialization:
yyjson_alc_pool_init(yyjson_alc *alc, void *buf, usize size): Sets up pool allocator using provided buffer.
Dynamic Allocator
Allocates variable-sized chunks on demand, maintains freelist for reuse.
Key structures:
dyn_chunk: Header with size and next pointer.dyn_ctx: Holds freelist and used list dummy headers.
Functions:
dyn_malloc: Finds suitable chunk or allocates new via default allocator.dyn_realloc: Resizes chunk or allocates new one, copying old data.dyn_free: Moves chunk from used list to freelist.
Creation/Destruction:
yyjson_alc_dyn_new(): Creates new dynamic allocator context.yyjson_alc_dyn_free(yyjson_alc *alc): Frees all chunks and the allocator itself.
JSON Document and Value Management
Supports immutable (
yyjson_doc,yyjson_val) and mutable (yyjson_mut_doc,yyjson_mut_val) JSON representations.Uses string pools and value pools to manage memory efficiently.
Supports copying between immutable and mutable forms:
yyjson_doc_mut_copy(): Creates a mutable copy of an immutable document.yyjson_mut_doc_imut_copy(): Creates an immutable copy of a mutable document.
Implements recursive and contiguous memory copying for arrays and objects.
Provides equality checks for JSON values (mutable and immutable).
JSON Reader (Parser)
Entry Points
yyjson_read_opts(char *dat, usize len, const yyjson_alc *alc_ptr, yyjson_read_err *err): Reads JSON from a buffer with options.yyjson_read_file(const char *path, yyjson_read_flag flg, const yyjson_alc *alc_ptr, yyjson_read_err *err): Reads JSON from a file path.yyjson_read_fp(FILE *file, yyjson_read_flag flg, const yyjson_alc *alc_ptr, yyjson_read_err *err): Reads JSON from a file pointer.
Parsing Strategy
Uses finite state machine (FSM) with
gotofor performance.Supports:
Single value JSON.
Minified JSON.
Pretty-formatted JSON.
Handles:
Objects
{}parsing with key-value pairs.Arrays
[]parsing with values.Literals:
true,false, null.Numbers with advanced IEEE-754 parsing.
Strings with UTF-8 validation and escape sequences.
Supports optional skipping of comments and trailing commas if enabled.
Limits recursion depth on container nesting to prevent stack overflow.
Number Parsing
Implements a highly optimized number parser supporting:
64-bit integers and signed integers.
IEEE-754 doubles with correct rounding.
Special literals:
Infinity, NaN (if enabled).
Uses multiple fast paths and a fallback bigint algorithm for large or precise numbers.
Supports a raw string fallback mode to read numbers as strings.
Uses precomputed pow10 tables and bit manipulations for fast conversion.
String Parsing
Reads UTF-8 encoded strings.
Handles escape sequences (
\n,\t,\uXXXX, surrogate pairs).Validates UTF-8 sequences with bitmasking and patterns.
Supports optional acceptance of invalid Unicode.
Uses loop unrolling and branch prediction for speed.
JSON Writer (Serializer)
Writes JSON values back to UTF-8 encoded strings.
Supports configurable options:
Pretty-printing.
Unicode escaping.
Special float handling (NaN,
Infinity).
Implements:
Fast integer formatting using digit lookup tables and division by constants.
Floating-point formatting using the Schubfach algorithm for shortest decimal representation with correct rounding.
String escaping with multiple encode tables depending on flags (escape slashes, unicode).
Buffer size estimations and padding ensure safe writes.
JSON Pointer API (RFC 6901)
Functions to parse JSON Pointer strings.
Supports retrieving values from JSON documents or mutable documents by pointer.
Handles token escaping (
~0->~,~1->/).Supports insertion, replacement, and removal of values via JSON Pointer.
Provides error reporting with detailed error codes and messages.
JSON Patch API (RFC 6902)
Applies a JSON Patch (array of operations) to a JSON document or mutable document.
Supports operations:
addremovereplacemovecopytest
Validates patch format and operation members.
Uses JSON Pointer functions for path resolution.
Returns updated mutable JSON value or
NULLon error with detailed error information.
JSON Merge Patch API (RFC 7386)
Implements merge patch semantics to merge two JSON objects.
If the patch is not an object, returns a copy of the patch.
Recursively merges objects, removing keys if values are null in the patch.
Supports both immutable and mutable JSON representations.
Utilities and Constants
Endianness detection and handling.
Unaligned memory access macros and safe copying.
Bit scanning utilities for leading/trailing zero counts.
128-bit multiplication helpers with fallback implementations.
Lookup tables for powers of ten (normalized 128-bit significands) for floating-point conversions.
Character classification tables for JSON syntax (whitespace, digits, hex, escape characters).
Macros for loop unrolling and branch prediction hints to optimize performance.
Important Implementation Details and Algorithms
Finite State Machine Parsing via
goto: The parser uses explicitgotolabels for each parsing state (object start, array value, string, number, etc.) to minimize call overhead and maximize branch prediction efficiency.BigInt Arithmetic for Floating Point Parsing: To ensure correct rounding for numbers that cannot fit precisely in 64-bit integers, a custom bigint implementation is used to represent and compare large numbers during number parsing.
Schubfach Algorithm for Floating Point Writing: Produces the shortest decimal representation of floating-point numbers with correct rounding, based on recent research by Raffaello Giulietti.
Memory Pool Allocators: Pool allocator uses a linked list of chunks, which can be split or merged to minimize fragmentation and provide fast allocation/deallocation for JSON strings and values.
Character Lookup Tables: Large static tables are generated for character classification and escaping to speed up string processing by avoiding conditional branches.
Compile-Time Options: Features can be enabled or disabled at compile time to tailor the parser for size, speed, or compatibility, such as disabling non-standard JSON features or fast floating-point conversion.
Interaction with Other Parts of the System or Application
This file provides the core JSON parsing and writing engine used by higher-level Rust modules in the project (e.g., the Rust deserialization backend) via FFI.
The Rust code conditionally includes this C implementation when the
yyjsonfeature is enabled, allowing seamless switching between native Rust parsing and the embeddedyyjsonC parser.Python integration layers call into this functionality via Rust bindings to achieve high-performance JSON parsing in Python.
Custom allocators defined here can be integrated with Rust or Python memory management to optimize memory usage across language boundaries.
JSON Pointer, Patch, and Merge Patch APIs facilitate advanced JSON data manipulation in the application.
Usage Examples
Parsing JSON from a buffer
yyjson_read_err err = {0};
yyjson_doc *doc = yyjson_read_opts(json_data, json_len, NULL, &err);
if (!doc) {
printf("Failed to parse JSON at position %zu: %s\n", err.pos, err.msg);
} else {
yyjson_val *root = yyjson_doc_get_root(doc);
// Use the root value...
yyjson_doc_free(doc);
}
Writing a JSON number
u8 buffer[32];
yyjson_val val;
val.tag = YYJSON_TYPE_NUM | YYJSON_SUBTYPE_REAL;
val.uni.f64 = 3.14159;
u8 *end = write_number(buffer, &val, YYJSON_WRITE_PRETTY);
printf("%.*s\n", (int)(end - buffer), buffer);
Using JSON Pointer to get a value
yyjson_val *val = unsafe_yyjson_ptr_getx(root, "/foo/bar", strlen("/foo/bar"), NULL);
if (val) {
// Use val
}
Applying a JSON Patch
yyjson_patch_err err = {0};
yyjson_mut_val *patched = yyjson_patch(doc, orig, patch, &err);
if (!patched) {
printf("Patch failed: %s\n", err.msg);
}
Visual Diagram: Core Function and Module Flow in yyjson.c
flowchart TD
subgraph Allocators
DefaultAlloc["default_malloc/realloc/free"]
PoolAlloc["pool_malloc/realloc/free"]
DynAlloc["dyn_malloc/realloc/free"]
end
subgraph JSON_Parsing
ReadOpts["yyjson_read_opts()"]
ReadRootMinify["read_root_minify()"]
ReadRootPretty["read_root_pretty()"]
ReadRootSingle["read_root_single()"]
ReadNumber["read_number()"]
ReadString["read_string()"]
SkipSpaces["skip_spaces_and_comments()"]
end
subgraph JSON_Writing
WriteNumber["write_number()"]
WriteF64Raw["write_f64_raw()"]
WriteString["write_string()"]
end
subgraph JSON_Utilities
JsonPtrGet["unsafe_yyjson_ptr_getx()"]
JsonPtrMutGet["unsafe_yyjson_mut_ptr_getx()"]
JsonPatch["yyjson_patch()"]
JsonMergePatch["yyjson_merge_patch()"]
ValCopy["yyjson_val_mut_copy()"]
MutValCopy["yyjson_mut_val_mut_copy()"]
Equals["unsafe_yyjson_equals() / unsafe_yyjson_mut_equals()"]
end
DefaultAlloc --> ReadOpts
PoolAlloc --> ReadOpts
DynAlloc --> ReadOpts
ReadOpts --> SkipSpaces
SkipSpaces -->|Determines format| ReadRootMinify
SkipSpaces --> ReadRootPretty
SkipSpaces --> ReadRootSingle
ReadRootMinify --> ReadNumber
ReadRootPretty --> ReadNumber
ReadRootSingle --> ReadNumber
ReadRootMinify --> ReadString
ReadRootPretty --> ReadString
ReadRootSingle --> ReadString
ReadNumber --- ValCopy
ReadString --- ValCopy
JSON_Parsing --> JSON_Utilities
JSON_Utilities --> JsonPtrGet
JSON_Utilities --> JsonPtrMutGet
JSON_Utilities --> JsonPatch
JSON_Utilities --> JsonMergePatch
JSON_Utilities --> Equals
JSON_Writing --> WriteNumber
WriteNumber --> WriteF64Raw
JSON_Writing --> WriteString
Summary
The `yyjson.c` file embodies the core implementation of the **yyjson** library, providing a highly optimized C-based JSON parsing and writing engine. It manages memory with flexible allocators, parses JSON efficiently using FSM and advanced numeric algorithms, and supports extended JSON utilities such as JSON Pointer and Patch. This file is critical for applications requiring blazing-fast, standards-compliant JSON processing with custom memory management and advanced features.