pycorrectness
Overview
The **`pycorrectness`** script is a comprehensive JSON parser correctness testing utility. It systematically verifies the ability of multiple JSON parsing libraries—specifically `orjson` and Python's built-in `json`—to correctly accept or reject a diverse collection of JSON fixtures. These fixtures include both valid and invalid JSON documents sourced from standardized test sets.
The script reads test fixtures from compressed and uncompressed files, applies each library's JSON loading function, and records whether the library correctly parses or rejects each document. It distinguishes between expected passes and failures based on filename conventions and whitelists, tabulates results, and reports mistaken acceptance or rejection counts for each library.
This tool is critical for validating the correctness and robustness of JSON parsers, ensuring that they conform to expected JSON standards and behave consistently across edge cases.
Detailed Explanation of Components
Constants and Global Variables
dirname
Base directory pointing to thedatafolder containing JSON fixtures.LIBRARIES
List of JSON libraries under test:["orjson", "json"].LIBRARY_FUNC_MAP
Maps library names to their respective JSON loading functions for deserialization:{ "orjson": orjson.loads, "json": json.loads, }PARSINGandJSONCHECKER
Dictionaries mapping filenames to raw bytes of JSON fixtures loaded fromdata/parsinganddata/jsoncheckerdirectories respectively. Files ending in.xzare decompressed usinglzma.RESULTS
A nested dictionary (defaultdict(dict)) storing pass/fail test results keyed by fixture filename and library.MISTAKEN_PASSES,MISTAKEN_FAILS
Dictionaries counting false positives and false negatives per library:Mistaken passes: valid JSON documents rejected by a library.
Mistaken fails: invalid JSON documents accepted by a library.
PASS_WHITELIST
A tuple of filenames that are exceptions to general pass/fail naming conventions.
Functions
read_fixture_bytes(filename: str, subdir: Optional[str] = None) -> bytes
Reads a fixture file as raw bytes.
Parameters:
filename: Name of the fixture file.subdir: Optional subdirectory under the main data directory.
Returns:
Raw bytes of the fixture content; decompresses.xzfiles automatically.Usage:
fixture_bytes = read_fixture_bytes("pass01.json", "parsing")
test_passed(libname: str, fixture: bytes) -> bool
Checks if the given JSON fixture passes parsing tests with the specified library.
Parameters:
libname: Library name ("orjson"or"json").fixture: Raw bytes of JSON data.
Returns:
Trueif both:Loading the fixture bytes matches
orjson.loadsresult.Loading the UTF-8 decoded string matches
orjson.loadsresult.
Otherwise, `False`.
Behavior:
Catches exceptions during parsing and treats them as test failures.Example:
result = test_passed("json", fixture_bytes) if result: print("Test passed for json library")
test_failed(libname: str, fixture: bytes) -> bool
Checks if the given JSON fixture fails parsing (i.e., is rejected) by the specified library as expected.
Parameters:
libname: Library name ("orjson"or"json").fixture: Raw bytes of JSON data.
Returns:
Trueif parsing fails for both:The raw bytes input.
The UTF-8 decoded string input.
Otherwise, `False`.
Behavior:
Catches exceptions and treats them as expected failure signals.Example:
result = test_failed("orjson", invalid_fixture_bytes) if result: print("Invalid JSON correctly rejected by orjson")
should_pass(filename: str) -> bool
Determines if a fixture file is expected to be valid JSON and should parse successfully.
Parameters:
filename: Name of the fixture file.
Returns:
Trueif filename matches pass conditions:Starts with
"y_"or"pass".Is in the
PASS_WHITELIST.
Example:
if should_pass("pass01.json"): print("This fixture should pass parsing")
should_fail(filename: str) -> bool
Determines if a fixture file is expected to be invalid JSON and should fail parsing.
Parameters:
filename: Name of the fixture file.
Returns:
Trueif filename matches fail conditions:Starts with
"n_","i_string","i_object", or"fail".Is NOT in the
PASS_WHITELIST.
Example:
if should_fail("fail02.json"): print("This fixture should fail parsing")
Main Execution Flow
Fixture Loading
Loads all test fixtures fromdata/parsinganddata/jsoncheckerdirectories intoPARSINGandJSONCHECKER.Iterate Over Libraries and Fixtures
For each library and each fixture set (PARSINGandJSONCHECKER), iterates through all files:Valid fixtures (
should_pass):
Runstest_passed()and records result.
IncrementsMISTAKEN_PASSESif the test unexpectedly fails.Invalid fixtures (
should_fail):
Runstest_failed()and records result.
IncrementsMISTAKEN_FAILSif the test unexpectedly passes.Intermediate fixtures (filenames starting with
"i_"):
Skips testing.Unknown fixtures:
RaisesNotImplementedError.
Result Tabulation
Compiles a markdown-formatted table showing each fixture's pass/fail status per library, printed to stdout.Mistaken Acceptance/Rejection Summary
Prints a summary table showing, per library:Number of invalid documents erroneously accepted.
Number of valid documents erroneously rejected.
Total Tests Count
Prints the total number of documents tested.
Important Implementation Details
Fixture Decoding and Comparison
The correctness tests compare the output of each library'sloadsfunction to that oforjson.loadsfor the same input. This cross-library comparison ensures a consistent reference point.Dual Input Forms Tested
Each fixture is tested twice per library:As raw bytes.
As UTF-8 decoded string.
Filename-Based Test Classification
The script relies heavily on naming conventions (prefixes such asy_,n_,pass,fail,i_) and a whitelist to determine expected test outcomes.Handling Compressed Fixtures
Supports.xzcompressed fixtures transparently usinglzma.Results Storage
Uses nested dictionaries for efficient result lookup and aggregation.Tabular Output
Uses thetabulatelibrary to produce human-readable, GitHub-flavored Markdown tables for easy inspection or integration into reports.
Usage Example
Assuming the `data/parsing` and `data/jsonchecker` directories contain JSON fixtures:
$ python pycorrectness
The script will print two markdown tables:
Detailed test results per fixture and library (with "ok" or "fail").
Summary of mistaken accepts/fails per library.
Finally, it prints how many documents were tested in total.
Interaction with Other System Components
Data Fixtures:
Relies on JSON test documents stored underdata/parsing/anddata/jsonchecker/directories.JSON Libraries Under Test:
Integrates withorjson(Rust-backed JSON library) and Python’s built-injsonmodule.Benchmarking Suite:
Complements the benchmarking framework by focusing on correctness validation rather than performance metrics.Test Automation:
Can be integrated into continuous integration pipelines to detect regressions or inconsistencies in JSON parsing correctness.
Mermaid Class Diagram
classDiagram
class pycorrectness {
- dirname: str
- LIBRARIES: list
- LIBRARY_FUNC_MAP: dict
- PARSING: dict
- JSONCHECKER: dict
- RESULTS: defaultdict(dict)
- MISTAKEN_PASSES: dict
- MISTAKEN_FAILS: dict
- PASS_WHITELIST: tuple
+ read_fixture_bytes(filename, subdir=None) bytes
+ test_passed(libname, fixture) bool
+ test_failed(libname, fixture) bool
+ should_pass(filename) bool
+ should_fail(filename) bool
+ main() void
}
Summary
The **`pycorrectness`** script is a specialized correctness validation tool for JSON parsers. It automates the process of verifying that JSON libraries correctly accept valid JSON and reject invalid JSON across extensive standardized test suites. By systematically comparing results, tabulating outcomes, and highlighting mistaken acceptances or rejections, it serves as a critical quality assurance component within the broader JSON benchmarking and testing ecosystem.