Benchmarking and Performance Testing

Overview

This module provides a comprehensive suite of tools and scripts designed to measure the performance and correctness of JSON serialization and deserialization across multiple JSON libraries. The primary goal is to benchmark speed, memory usage, and correctness to validate the efficiency improvements of the Rust-backed JSON library compared to standard Python JSON libraries.

The benchmarking framework focuses on realistic scenarios using large JSON fixtures, varying data complexities, and multiple libraries to establish performance baselines and relative comparisons.


Core Concepts and Purpose

This framework addresses the problem of verifying that orjson not only performs faster but also maintains correctness and reasonable memory usage under different realistic loads.


Benchmarking Components and Workflows

The benchmarking module is organized into several key scripts and utilities, each targeting specific aspects of performance testing:

1. Serialization Benchmarks (bench/benchmark_dumps.py)

Example snippet illustrating the test structure:

@pytest.mark.parametrize("library", libraries)
@pytest.mark.parametrize("fixture", fixtures)
def test_dumps(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture_obj(f"{fixture}.xz")
    benchmark.extra_info["correct"] = json_loads(dumper(data)) == data
    benchmark(dumper, data)

2. Deserialization Benchmarks (bench/benchmark_loads.py)

Key workflow snippet:

@pytest.mark.parametrize("fixture", fixtures)
@pytest.mark.parametrize("library", libraries)
def test_loads(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture(f"{fixture}.xz")
    correct = json_loads(dumper(loader(data))) == json_loads(data)
    benchmark.extra_info["correct"] = correct
    benchmark(loader, data)

3. Empty JSON Benchmark (bench/benchmark_empty.py)


4. Memory Usage Profiling (bench/run_mem)

Illustrative excerpt:

mem_before = proc.memory_info().rss

for _ in range(100):
    val = loads(fixture)

mem_after = proc.memory_info().rss
mem_diff = mem_after - mem_before
print(f"{mem_before},{mem_diff},{correct}")

5. Custom Serialization Benchmark (bench/run_default)


6. Utility Functions (bench/util.py)

Example:

@cache
def read_fixture_obj(filename: str) -> Any:
    return orjson.loads(read_fixture(filename))

7. Library and Fixture Definitions (bench/data.py)


8. Additional Analytical and Benchmarking Scripts (script/ directory)

For example, `script/pydataclass` benchmarks serialization of dataclass-based objects versus dictionaries and reports timing comparisons between libraries.


Module Interactions and Relationships

The benchmarking module integrates tightly with the rest of the project by exercising the core serialization and deserialization functionalities exposed through the Python API, which internally calls the Rust-implemented JSON operations.


Design Patterns and Approaches


Mermaid Diagram: Benchmarking Workflow Sequence

sequenceDiagram
    participant User as Developer/User
    participant Benchmarks as Benchmark Scripts
    participant Fixtures as JSON Fixture Data
    participant Libraries as JSON Libraries (orjson, json)
    participant Profiler as Memory & CPU Profiling

    User->>Benchmarks: Initiate benchmark run (e.g., pytest, run_func)
    Benchmarks->>Fixtures: Load compressed JSON fixture
    Fixtures-->>Benchmarks: Provide JSON data (bytes or object)
    Benchmarks->>Libraries: Call library dump/load functions
    Libraries-->>Benchmarks: Return serialized or deserialized data
    Benchmarks->>Profiler: Measure time & memory usage
    Profiler-->>Benchmarks: Return profiling results
    Benchmarks-->>User: Report benchmark metrics and correctness

This detailed documentation explains how the Benchmarking and Performance Testing module operates, its components, workflows, and interactions within the project ecosystem. It highlights the systematic approach to validating orjson’s performance and correctness gains compared to other JSON libraries.