Serialization Benchmarks

Purpose

Serialization Benchmarks focus on evaluating and comparing the performance of JSON encoding (serialization) across multiple JSON libraries, including the project’s core Rust-backed implementation. This subtopic addresses the need to quantitatively measure serialization speed, efficiency, and correctness when converting Python objects into JSON byte streams. It helps identify performance advantages, bottlenecks, and behavior under various data and option scenarios — insights critical for optimizing the JSON serialization component of the project.

Functionality

The serialization benchmark suite comprises a collection of scripts and tests that:

Key workflows include:

  1. Benchmark Invocation: Using pytest with parameterized inputs to iterate over JSON fixtures and libraries systematically.

  2. Fixture Loading: Reading compressed JSON files (e.g., .xz) into in-memory Python objects using utility functions like read_fixture_obj.

  3. Serialization Timing: Leveraging Python’s timeit or pytest-benchmark to quantify serialization latency over multiple iterations for statistical significance.

  4. Correctness Verification: Ensuring that serialized outputs, when parsed back, match the original data, thus confirming functional integrity alongside speed.

  5. Option Variants: Testing serialization with different options such as sorted keys, indentation, or special flags (e.g., OPT_SERIALIZE_DATACLASS, OPT_NON_STR_KEYS) to evaluate overheads or gains.

An example snippet from the core benchmark test illustrates the integration:

@pytest.mark.parametrize("library", libraries)
@pytest.mark.parametrize("fixture", fixtures)
def test_dumps(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture_obj(f"{fixture}.xz")
    benchmark.extra_info["correct"] = json_loads(dumper(data)) == data  # correctness check
    benchmark(dumper, data)

Here, `benchmark` runs the serialization function `dumper` repeatedly on loaded data and records performance statistics.

Integration

Serialization Benchmarks complement the broader benchmarking framework by providing focused insights on JSON encoding efficiency, distinct from deserialization or utility benchmarks. They interact closely with:

This subtopic thus acts as a critical feedback loop driving optimization decisions in the serialization core, ensuring that enhancements maintain or improve throughput without sacrificing correctness or flexibility.

Diagram

flowchart TD
    A[Load JSON Fixture (.xz)] --> B[Deserialize to Python Object]
    B --> C{Select Library}
    C -->|orjson| D[orjson.dumps with Options]
    C -->|json| E[json.dumps with Options]
    C -->|Other| F[Other Library's dumps]
    D --> G[Verify Correctness by json.loads]
    E --> G
    F --> G
    G --> H[Measure Time & Collect Metrics]
    H --> I[Aggregate Results & Compare]
    I --> J[Output Benchmark Report]

This flowchart captures the core process of serialization benchmarking: loading data, serializing with various libraries, verifying correctness, timing the operations, and compiling comparative results.