yyjson.rs
Overview
This file provides a Rust-based JSON deserialization implementation using the [yyjson](https://github.com/ibireme/yyjson) library, a high-performance JSON parsing C library. The primary purpose of this module is to parse JSON text into Python objects efficiently by leveraging low-level memory operations and custom memory allocation strategies.
The module integrates with Python’s C API (via FFI) to create native Python data structures (`PyList`, `PyDict`, Python primitives) from JSON data, facilitating fast JSON deserialization in a Python environment (likely part of a Python extension module). It handles JSON arrays, objects, strings, numbers, booleans, and null values.
Detailed Documentation
Constants and Macros
YYJSON_TAG_BIT: Bit offset used to encode the length in the tag field of
yyjson_val.YYJSON_VAL_SIZE: Size of the
yyjson_valstruct in bytes.TAG_*: Bitmask constants representing different JSON value types (array, string, number types, null, booleans, object).
is_yyjson_tag!($elem, $tag): Macro that checks if a JSON value’s tag matches a specified tag.
Utility Functions
yyjson_doc_get_root(doc: *mut yyjson_doc) -> *mut yyjson_val
Retrieves the root JSON value from a parsed document.unsafe_yyjson_get_len(val: *mut yyjson_val) -> usize
Extracts the length of an array or object container from the tag field.unsafe_yyjson_get_first(ctn: *mut yyjson_val) -> *mut yyjson_val
Returns a pointer to the first child element of a container (array/object).buffer_capacity_to_allocate(len: usize) -> usize
Calculates the appropriate size of the memory buffer to allocate for parsing based on input JSON length, ensuring alignment and padding.unsafe_yyjson_is_ctn(val: *mut yyjson_val) -> bool
Tests if a value is a container type (array or object).unsafe_yyjson_get_next_container(val: *mut yyjson_val) -> *mut yyjson_val
Given a container element, gets the pointer to the next container element.unsafe_yyjson_get_next_non_container(val: *mut yyjson_val) -> *mut yyjson_val
Given a non-container element, gets pointer to the next element, assuming fixed size.
Enum: ElementType
Represents the JSON element types supported by yyjson.
Variant | Description |
|---|---|
String | JSON string |
Uint64 | Unsigned 64-bit int |
Int64 | Signed 64-bit int |
Double | Floating point number |
Null | JSON null |
True | JSON true boolean |
False | JSON false boolean |
Array | JSON array container |
Object | JSON object container |
Method: from_tag
fn from_tag(elem: *mut yyjson_val) -> Self
Converts a raw tag byte from a `yyjson_val` pointer to an `ElementType`. Unsafe, assumes valid tag.
Main Function: deserialize
pub(crate) fn deserialize(
data: &'static str,
) -> Result<NonNull<pyo3_ffi::PyObject>, DeserializeError<'static>>
**Purpose:** Parses a JSON string into a Python object (`PyObject`) using `yyjson` and Python's C API.
**Parameters:**
data: A static string slice representing the JSON input text.
**Returns:**
Ok(NonNull<pyo3_ffi::PyObject>): Pointer to the newly created Python object representing the JSON data.Err(DeserializeError): Error describing failure to parse or allocate memory.
**Usage:** This function is the entry point to convert JSON text into Python native objects. It can be called by higher-level Python extension code to produce Python-compatible results.
**Implementation Highlights:**
Allocates a memory buffer sized by
buffer_capacity_to_allocate.Initializes a
yyjsonallocator to parse JSON using this buffer.Calls
yyjson_read_optsto parse the JSON string.If parsing fails, frees the buffer and returns a parse error with a descriptive message.
On success, retrieves the root JSON value.
Recursively converts the JSON value tree to Python objects:
Primitive JSON types are converted using helper functions (
parse_yy_string,parse_yy_u64, etc.).Arrays are converted to Python lists by
populate_yy_array.Objects are converted to Python dictionaries by
populate_yy_object.
Frees the allocated buffer after parsing.
Returns the constructed Python object.
Parsing Helper Functions
These functions convert leaf JSON types from `yyjson_val` pointers to Python objects.
Function | Description |
|---|---|
`parse_yy_string` | Parses a JSON string into a Python `str` object. |
`parse_yy_u64` | Parses an unsigned 64-bit integer. |
`parse_yy_i64` | Parses a signed 64-bit integer. |
`parse_yy_f64` | Parses a floating-point number. |
Each returns a non-null pointer to a Python object.
Container Population Functions
These functions recursively traverse JSON arrays and objects, populating Python lists and dictionaries.
populate_yy_array
fn populate_yy_array(list: *mut pyo3_ffi::PyObject, elem: *mut yyjson_val)
Parameters:
list: Pointer to a Python list object.elem: Pointer to a JSON arrayyyjson_val.
Behavior:
Iterates over array elements. For each element:If container (array or object), recursively creates sublists or subdicts.
If primitive, converts using helper parsing functions.
Inserts converted Python objects into the list.
populate_yy_object
fn populate_yy_object(dict: *mut pyo3_ffi::PyObject, elem: *mut yyjson_val)
Parameters:
dict: Pointer to a Python dictionary object.elem: Pointer to a JSON objectyyjson_val.
Behavior:
Iterates over key-value pairs. For each pair:Extracts the key string and converts to Python Unicode key.
If value is container, recursively creates sublists or subdicts.
Otherwise, converts to Python primitive.
Inserts key-value pair into the dictionary.
Important Implementation Details
Unsafe Code Usage:
The module extensively uses unsafe Rust code for performance, directly manipulating raw pointers to C structs (yyjson_val) and Python objects (PyObject).Memory Management:
Uses custom allocator (yyjson_alc_pool_init) with a pre-allocated buffer for parsing, avoiding dynamic heap allocations during parsing.Integration with Python C API:
Constructs Python objects (PyList,PyDict, primitives) through FFI calls, making the module suitable as a backend for Python JSON deserialization.Tag-based Type Discrimination:
JSON element types are distinguished by inspecting a tag byte inyyjson_val, enabling quick type-based dispatch.Recursive Descent Parsing:
Arrays and objects are parsed recursively, matching JSON nested structures.Error Handling:
Returns detailedDeserializeErrorwith position and message on parse failure.
Interaction with Other Components
crate::deserialize::pyobjectModule:
Provides utility functions for parsing primitive JSON types (parse_f64,parse_i64, etc.) and Unicode key handling.crate::ffi::yyjsonModule:
Contains bindings and definitions for theyyjsonC library used for JSON parsing.Python Interpreter (via
pyo3_ffi):
The Python C API is invoked to create Python objects and manage memory, allowing seamless integration with Python runtime.crate::str::PyStr:
Used to safely create Python string objects from Rust string slices.
This file serves as the core JSON deserialization engine that transforms raw JSON input into Python-native representations for further use by the application.
Visual Diagram: Module Structure and Workflow
flowchart TD
A[deserialize(data: &str)] --> B[Allocate buffer]
B --> C[Initialize yyjson allocator]
C --> D[yyjson_read_opts -> parse JSON]
D -->|Success| E[Get root JSON value]
E --> F{Is container?}
F -->|No| G[Parse primitive types]
F -->|Yes| H{Array or Object?}
H -->|Array| I[Create Python list]
I --> J[populate_yy_array]
H -->|Object| K[Create Python dict]
K --> L[populate_yy_object]
G --> M[Return Python object]
J --> M
L --> M
D -->|Fail| N[Handle error and free buffer]
M --> O[Free buffer]
O --> P[Return Python object or error]
Usage Example
use std::ptr::NonNull;
fn example_usage() -> Result<(), DeserializeError<'static>> {
let json_str = r#"{"name": "Alice", "age": 30, "is_student": false}"#;
let py_object: NonNull<pyo3_ffi::PyObject> = yyjson::deserialize(json_str)?;
// py_object now points to a Python dictionary equivalent to the JSON input.
// Further interaction would involve Python C API to use this object.
Ok(())
}
Summary
The [yyjson.rs](/projects/287/67700) file implements a high-performance JSON deserializer that parses JSON strings into native Python objects by harnessing the `yyjson` C library and Python's C API. It uses unsafe Rust code for direct memory manipulation and provides recursive traversal of JSON structures to build Python lists and dictionaries. This module is central to the JSON deserialization workflow in the system, converting raw JSON data into usable Python representations efficiently and with detailed error handling.