JSON Parsing and Representation

Overview

This module provides the foundational capability to parse JSON-formatted text into a structured, in-memory representation that encompasses all standard JSON data types: null, boolean, number, string, array, and object. By transforming a textual JSON string into a tree of json_value objects, it enables safe, efficient access and manipulation of JSON data within the application.

The primary objective is to enable downstream components to work with JSON data programmatically without repeatedly parsing or dealing with raw text. The parsed representation supports querying, modification, and serialization back to JSON text, forming a core layer for the entire JSON processing library.

Core Concepts

In-Memory JSON Structure

The in-memory representation centers around the json_value data structure (declared in json.h), which uses a tagged union pattern to represent any JSON type:

The json_object struct represents each key-value pair in an object, maintaining a pointer and length for the key (to the original JSON string) and a pointer to the value.

This design allows efficient traversal and manipulation of JSON trees, supports deep nesting, and preserves the original JSON text for any value.

Recursive JSON Parsing

Parsing is implemented as a recursive descent parser operating directly on the input string. The key approach is:

The parser builds an in-memory tree of json_value objects, allocating new nodes dynamically and linking nested structures appropriately.

If the parser encounters invalid syntax or unexpected characters at any point, it aborts and frees allocated memory, ensuring no partial or corrupt data remains.

Memory Management and Ownership

Key Functionalities and Workflow

Parsing Workflow

  1. Entry Point: json_parse(const char *json) is called with the JSON text input.

  2. Whitespace Skipping: Leading whitespace is skipped to find the first significant token.

  3. Value Parsing: parse_value_build recursively parses the root JSON value based on the leading character.

  4. Type-Specific Parsing:

    • Strings: parse_string_value handles quoted strings with escape sequences.

    • Numbers: parse_number_value parses numeric literals using strtod.

    • Arrays: parse_array_value parses comma-separated values within square brackets.

    • Objects: parse_object_value parses key-value pairs within curly braces.

    • Literals: match_literal_build matches null, true, and false.

  5. Validation: After parsing, trailing whitespace is skipped, and the parser ensures no trailing data exists.

  6. Return: The root json_value tree is returned, or NULL on failure.

Access and Manipulation

Serialization

While serialization is primarily covered in the [JSON Serialization and Testing](None) topic, this module provides functions to print JSON values back to text, preserving formatting and escaping.

Interaction with Other Modules

Illustration of the JSON Parsing Process

flowchart TD
A[JSON Text Input] --> B[Skip Whitespace]
B --> C{Next Token?}
C -->|String|" --> D["Parse String (parse_string_value)"]
C -->|Number|digit --> E["Parse Number (parse_number_value)"]
C -->|Array| "["" --> F[Parse Array (parse_array_value)"]
C -->|Object| "{" --> G["Parse Object (parse_object_value)"]
C -->|Literal|null,true,false --> H["Match Literal (match_literal_build)"]
D --> I[Return json_value]
E --> I
F --> I
G --> I
H --> I
I --> J[Skip Trailing Whitespace]
J --> K{End of Input?}
K -->|Yes| L[Return Parsed Tree]
K -->|No| M[Parsing Error: Unexpected Data]
M --> N[Free Allocated Memory]

Overview of In-Memory JSON Data Structures

classDiagram
class json_value {
+json_type type
+union {
reference string
reference boolean
reference number
array {
json_value* items
size_t count
size_t capacity
}
object {
json_object* items
size_t count
size_t capacity
}
}
}
class json_object {
+const char* ptr
+size_t len
+json_value* value
}
json_value "1" o-- "*" json_object : contains
json_value "1" o-- "*" json_value : array items

Important Concepts

Summary of Main Functions (in src/json.c)

This module forms the essential first step in the full JSON processing pipeline, enabling all subsequent manipulation, comparison, and serialization operations by providing a robust, memory-efficient in-memory JSON representation. For details on how to manipulate and compare these structures, see [JSON Manipulation and Comparison](None), and for serialization, see [JSON Serialization and Testing](None).