Deserialization Benchmarks

Purpose

Deserialization Benchmarks focus on evaluating the speed and correctness of parsing JSON data into Python objects. This subtopic addresses the need to measure how efficiently different libraries convert raw JSON text or bytes into usable in-memory Python structures. It also verifies parser robustness against a wide array of JSON inputs, including edge cases and malformed documents, ensuring parsing accuracy and compliance with JSON standards.

This complements the main benchmarking topic by concentrating exclusively on the "load" (parsing) operation rather than serialization. It targets performance metrics such as throughput and memory usage as well as correctness validation, which are critical for applications that consume large or complex JSON data.

Functionality

The deserialization benchmarking workflow involves:

Fixture Loading
Reading JSON test fixtures, often stored in compressed .xz files, to simulate realistic input data for parsing.
Library Parameterization
Running benchmarks across multiple JSON libraries (e.g., orjson, Python’s built-in json) to compare parsing speed and correctness.
Correctness Verification
Ensuring that the parsed output matches the expected Python object structure by comparing against a canonical parser (usually json.loads).
Performance Measurement
Using pytest’s benchmarking fixture to record timing information and optionally memory usage during repeated parsing runs.
Failure Detection
Running correctness tests on both valid and deliberately malformed JSON documents to check for proper rejection or acceptance, capturing mistaken passes or failures.
Reporting
Summarizing results in tabular form, showing which libraries passed or failed on each test fixture, highlighting parsing robustness.

Key Methods and Data Flows

benchmark_loads.py uses pytest.mark.parametrize to iterate over JSON fixtures and libraries. It reads each fixture with read_fixture() and benchmarks the loader function.
run_func script allows repeated deserialization runs on a specified file for throughput measurement, optionally pinning CPU affinity for consistent benchmarking.
run_mem script measures memory consumption before and after multiple deserialization calls to detect memory overhead.
pycorrectness script loads extensive JSON fixtures from data/ directories, testing both acceptance of valid JSON and rejection of invalid JSON across multiple libraries. It tabulates and prints detailed pass/fail results, including mistaken rejections or accepts.

Example snippet from [benchmark_loads.py](/projects/287/67674) illustrating core benchmark setup:

@pytest.mark.parametrize("fixture", fixtures)
@pytest.mark.parametrize("library", libraries)
def test_loads(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture(f"{fixture}.xz")
    correct = json_loads(dumper(loader(data))) == json_loads(data)
    benchmark.extra_info["correct"] = correct
    benchmark(loader, data)

This snippet shows how each loader is benchmarked on a compressed fixture, with correctness verified by round-trip serialization.

Relationship to Parent Topic and Other Subtopics

Deserialization Benchmarks integrate tightly with the overarching Benchmarking and Performance Testing topic by providing the complementary focus on JSON parsing (deserialization), balancing the serialization benchmarks that measure JSON encoding speed.

They rely on shared utilities for fixture management and benchmarking infrastructure from the Benchmark Utilities subtopic, promoting reuse of fixture reading (`read_fixture`) and benchmark parameterization.

Together with Serialization Benchmarks, they provide a comprehensive view of JSON library performance in both directions, enabling informed decisions on library suitability for different workloads.

The correctness tests in this subtopic introduce validation against malformed inputs, which is not covered in serialization benchmarks, thus adding a new dimension of robustness checking that enhances overall project quality assurance.

Diagram

flowchart TD
    A[Load Compressed JSON Fixture] --> B[Select JSON Library]
    B --> C[Deserialize JSON Data]
    C --> D{Is Output Correct?}
    D -->|Yes| E[Record Success & Measure Time/Memory]
    D -->|No| F[Record Failure]
    E --> G[Repeat for All Fixtures & Libraries]
    F --> G
    G --> H[Generate Benchmark & Correctness Report]

This flowchart illustrates the core deserialization benchmark process: loading test data, selecting libraries, parsing, verifying correctness, measuring performance, and reporting results.