Recursive JSON Parsing
Purpose
Within the broader scope of JSON Parsing and Representation, Recursive JSON Parsing addresses the challenge of efficiently converting JSON text into a structured, in-memory tree representation by leveraging recursion. JSON’s nested nature—with arrays containing objects, objects containing arrays, and so forth—requires a parsing approach capable of handling arbitrarily deep structures while maintaining correctness and performance.
This subtopic focuses specifically on implementing recursive descent parsing techniques that systematically process JSON strings, numbers, arrays, and objects. It ensures proper state management during parsing, particularly for complex constructs like escaped characters in strings and nested containers. The approach enables the core parser to build accurate data structures while detecting syntax errors and handling whitespace effectively.
Functionality
The recursive parsing workflow is initiated by the public API function json_parse(), which prepares the input and calls the core recursive parser function parse_value_build(). This function dispatches parsing to specialized routines depending on the JSON token encountered:
Strings: Handled by
parse_string_value(), which processes characters inside double quotes, managing escape sequences and Unicode escapes using a finite state machine.Numbers: Parsed by
parse_number_value()using standard C library conversion (strtod) to validate numeric formats.Arrays: Parsed by
parse_array_value(), which recursively invokesparse_value_build()for each element, managing commas and termination brackets.Objects: Parsed by
parse_object_value(), which recursively parses string keys and corresponding values, managing colon separators, commas, and closing braces.
Each parsing function maintains its own state and position pointer within the input string. They invoke each other recursively as nested structures are encountered. Whitespace skipping is performed at appropriate points to ensure robustness against formatted JSON input.
Key aspects of the implementation include:
State Machine for Strings: The parsing of strings uses explicit states (
STATE_INITIAL,STATE_ESCAPE_START,STATE_ESCAPE_UNICODE_BYTE1, etc.) to correctly interpret escape sequences and validate Unicode hex digits.Memory Management: Each parsed element allocates a
json_valuestruct, often usingcalloc(). Ownership of parsed keys and values is carefully managed to avoid leaks or double-frees, especially in objects.Error Handling: If any parsing step fails (e.g., invalid escape, unexpected token, memory allocation failure), the recursive calls unwind by freeing allocated memory and returning
NULLto signal failure.Recursion Depth Tracking: The
idparameter passed through recursive calls can be used for debugging or limiting recursion depth, though it is incremented without explicit limits in this code.
This recursive approach contrasts with iterative or lexer-driven parsers by directly mirroring the JSON grammar in function calls, which simplifies the logic and fits naturally with JSON’s nested structure.
Code Snippet Illustrating Recursive Dispatch
static json_value *parse_value_build(const char **s, int id) {
skip_ws(s);
if (**s == '"')
return parse_string_value(s);
if (**s == '{')
return parse_object_value(s, ++id);
if (**s == '[')
return parse_array_value(s, ++id);
if (**s == 'n') {
if (match_literal_build(s, "null"))
return json_new_null();
return NULL;
}
if (**s == 't') {
const char *ptr = *s;
if (match_literal_build(s, "true"))
return json_new_boolean(ptr, 4);
return NULL;
}
if (**s == 'f') {
const char *ptr = *s;
if (match_literal_build(s, "false"))
return json_new_boolean(ptr, 5);
return NULL;
}
if (**s == '-' || isdigit((unsigned char)**s))
return parse_number_value(s);
return NULL;
}
Integration
Recursive JSON Parsing is a foundational subtopic within JSON Parsing and Representation, providing the low-level mechanics behind the construction of the in-memory JSON value trees. It complements the In-Memory JSON Structure subtopic by producing the structured data that those data structures represent.
The parsed json_value tree generated by this recursive parsing process is subsequently used by other modules:
JSON Manipulation and Comparison (80082): Manipulates and compares these parsed trees.
JSON Serialization and Testing (80083): Serializes the trees back into JSON text and validates parser behavior.
Thus, recursive parsing acts as the entry point converting raw JSON text into the internal format that all other functionalities operate on safely and efficiently.
Diagram
flowchart TD
Start[Start Parsing] --> SkipWS[Skip Whitespace]
SkipWS --> CheckToken{Next Char}
CheckToken -->|'s'| ParseString[Parse String]
CheckToken -->|'{'| ParseObject[Parse Object]
CheckToken -->|'\['| ParseArray[Parse Array]
CheckToken -->|'n'| MatchNull[Match "null"]
CheckToken -->|'t'| MatchTrue[Match "true"]
CheckToken -->|'f'| MatchFalse[Match "false"]
CheckToken -->|'-' or digit| ParseNumber[Parse Number]
CheckToken -->|Other| Fail[Parse Fail]
ParseString --> ReturnValue[Return json_value]
ParseObject --> ParseObjectLoop[Parse Key-Value Pairs Recursively]
ParseObjectLoop --> ReturnValue
ParseArray --> ParseArrayLoop[Parse Elements Recursively]
ParseArrayLoop --> ReturnValue
MatchNull --> ReturnNull[Return Null Value]
MatchTrue --> ReturnTrue[Return Boolean True]
MatchFalse --> ReturnFalse[Return Boolean False]
ParseNumber --> ReturnNumber[Return Number Value]
ReturnNull --> End[End Parsing]
ReturnTrue --> End
ReturnFalse --> End
ReturnNumber --> End
ReturnValue --> End
Fail --> End
This flowchart illustrates the recursive descent decision process and recursive calls that build the JSON value tree from the input string.