Benchmarking and Performance Testing

Overview

This module provides a comprehensive suite of tools and scripts designed to measure the performance and correctness of JSON serialization and deserialization across multiple JSON libraries. The primary goal is to benchmark speed, memory usage, and correctness to validate the efficiency improvements of the Rust-backed JSON library compared to standard Python JSON libraries.

The benchmarking framework focuses on realistic scenarios using large JSON fixtures, varying data complexities, and multiple libraries to establish performance baselines and relative comparisons.

Core Concepts and Purpose

Performance Measurement: Quantify the speed of encoding (serialization) and decoding (deserialization) JSON data with different libraries.
Memory Profiling: Track memory consumption during repeated serialization/deserialization to evaluate resource efficiency.
Correctness Validation: Ensure that serialized and deserialized data remains consistent and compliant with JSON standards.
Comparative Analysis: Provide benchmarks that directly compare orjson's Rust-backed performance against Python's built-in json and potentially other JSON libraries.
Repeatable and Automated Testing: Use parametrized tests and scripted runs to execute benchmarks reliably on various JSON fixtures.

This framework addresses the problem of verifying that orjson not only performs faster but also maintains correctness and reasonable memory usage under different realistic loads.

Benchmarking Components and Workflows

The benchmarking module is organized into several key scripts and utilities, each targeting specific aspects of performance testing:

1. Serialization Benchmarks (`bench/benchmark_dumps.py`)

Uses pytest and the pytest-benchmark plugin to measure serialization speed.
Iterates over a set of predefined JSON fixtures (fixtures) and libraries (libraries).
For each fixture-library pair, it loads the JSON object from a compressed .xz file, serializes it using the library's dumper function, and benchmarks the operation.
Validates serialization correctness by deserializing the output and comparing it to the original data.

Example snippet illustrating the test structure:

@pytest.mark.parametrize("library", libraries)
@pytest.mark.parametrize("fixture", fixtures)
def test_dumps(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture_obj(f"{fixture}.xz")
    benchmark.extra_info["correct"] = json_loads(dumper(data)) == data
    benchmark(dumper, data)

2. Deserialization Benchmarks (`bench/benchmark_loads.py`)

Similar to serialization benchmarks but focused on deserialization performance.
Reads compressed JSON fixture files as bytes.
Deserializes using the library’s loader function while benchmarking.
Validates correctness by serializing the loaded object and comparing to the original JSON.

Key workflow snippet:

@pytest.mark.parametrize("fixture", fixtures)
@pytest.mark.parametrize("library", libraries)
def test_loads(benchmark, fixture, library):
    dumper, loader = libraries[library]
    data = read_fixture(f"{fixture}.xz")
    correct = json_loads(dumper(loader(data))) == json_loads(data)
    benchmark.extra_info["correct"] = correct
    benchmark(loader, data)

3. Empty JSON Benchmark (`bench/benchmark_empty.py`)

Benchmarks serialization and deserialization of minimal JSON values like empty arrays, objects, and strings.
Ensures that edge cases are also measured for performance and correctness.

4. Memory Usage Profiling (`bench/run_mem`)

Measures Resident Set Size (RSS) memory usage before and after repeated deserialization of a JSON fixture.
Supports both orjson and standard json libraries.
Prints memory consumption delta and correctness verification.

Illustrative excerpt:

mem_before = proc.memory_info().rss

for _ in range(100):
    val = loads(fixture)

mem_after = proc.memory_info().rss
mem_diff = mem_after - mem_before
print(f"{mem_before},{mem_diff},{correct}")

5. Custom Serialization Benchmark (`bench/run_default`)

Tests serialization speed of complex Python objects, including those requiring custom fallback serialization handlers.
Demonstrates the use of orjson options such as OPT_SERIALIZE_NUMPY for handling numpy arrays.
Measures the performance impact of fallback mechanisms.

6. Utility Functions (`bench/util.py`)

Provides cached functions to read fixtures efficiently from the data/ directory.
Supports reading compressed .xz files and returning raw bytes or deserialized Python objects.
Sets CPU affinity to specific cores for consistent benchmarking results.

Example:

@cache
def read_fixture_obj(filename: str) -> Any:
    return orjson.loads(read_fixture(filename))

7. Library and Fixture Definitions (`bench/data.py`)

Defines the JSON fixtures used for benchmarking, such as "canada.json", "github.json", etc.
Maps library names to their respective dump/load functions for easy parametrization.
Supports at least two libraries: orjson and Python’s built-in json.

8. Additional Analytical and Benchmarking Scripts (`script/` directory)

Includes scripts like script/pydataclass, script/pysort, script/pynonstr, and script/pyindent.
These run targeted benchmarks on specific JSON data shapes or serialization options.
Utilize timeit to measure multiple iterations and calculate per-iteration latencies.
Use tabulate and visualization libraries like matplotlib and seaborn for reporting and graphing results.

For example, `script/pydataclass` benchmarks serialization of dataclass-based objects versus dictionaries and reports timing comparisons between libraries.

Module Interactions and Relationships

Benchmark Scripts & Data Fixtures: Benchmark scripts read compressed JSON fixtures via utility functions in bench/util.py. The fixtures represent real-world or edge-case JSON data stored in data/.
Library Abstraction: bench/data.py defines the libraries and their corresponding dump/load functions, allowing benchmarks to run uniformly across different JSON implementations.
Benchmarking Framework (pytest-benchmark): Utilized in test scripts (bench/benchmark_dumps.py, bench/benchmark_loads.py, etc.) to standardize timing, grouping, and reporting.
Memory Profiling Integration: Scripts like bench/run_mem use the psutil library to measure memory usage, complementing timing benchmarks.
CPU Affinity Setting: bench/util.py and several scripts set CPU affinity to cores {0, 1} to reduce variability in benchmark timings.
Correctness Checks: All benchmarks include correctness validation by comparing serialized or deserialized outputs against known-good results using Python’s built-in JSON module as a baseline.

The benchmarking module integrates tightly with the rest of the project by exercising the core serialization and deserialization functionalities exposed through the Python API, which internally calls the Rust-implemented JSON operations.

Design Patterns and Approaches

Parametrized Testing: Uses pytest.mark.parametrize extensively to run benchmarks over combinations of fixtures and libraries, maximizing coverage and automation.
Caching: Employs Python’s functools.cache to avoid redundant decompression or loading of fixture data, improving test efficiency.
Isolation and Repeatability: Benchmarks are designed to be reproducible by fixing CPU affinity and disabling garbage collection during critical sections.
Separation of Concerns: Clear separation between reading fixtures (bench/util.py), defining benchmark scenarios (bench/benchmark_*.py), and executing ad-hoc performance runs (bench/run_*).
Cross-Library Comparison: Abstracts over different JSON libraries by mapping their dump/load interfaces, facilitating fair and consistent benchmark comparisons.
Memory and Speed Focus: Alongside raw speed, memory consumption is explicitly measured, which is crucial for high-performance JSON processing libraries.
Result Visualization: Scripts under script/graph process benchmark output data to generate visual summaries, helping interpret benchmarking outcomes.

Mermaid Diagram: Benchmarking Workflow Sequence

sequenceDiagram
    participant User as Developer/User
    participant Benchmarks as Benchmark Scripts
    participant Fixtures as JSON Fixture Data
    participant Libraries as JSON Libraries (orjson, json)
    participant Profiler as Memory & CPU Profiling

    User->>Benchmarks: Initiate benchmark run (e.g., pytest, run_func)
    Benchmarks->>Fixtures: Load compressed JSON fixture
    Fixtures-->>Benchmarks: Provide JSON data (bytes or object)
    Benchmarks->>Libraries: Call library dump/load functions
    Libraries-->>Benchmarks: Return serialized or deserialized data
    Benchmarks->>Profiler: Measure time & memory usage
    Profiler-->>Benchmarks: Return profiling results
    Benchmarks-->>User: Report benchmark metrics and correctness

This detailed documentation explains how the Benchmarking and Performance Testing module operates, its components, workflows, and interactions within the project ecosystem. It highlights the systematic approach to validating orjson’s performance and correctness gains compared to other JSON libraries.

Benchmarking and Performance Testing

Overview

Core Concepts and Purpose

Benchmarking Components and Workflows

1. Serialization Benchmarks (bench/benchmark_dumps.py)

2. Deserialization Benchmarks (bench/benchmark_loads.py)

3. Empty JSON Benchmark (bench/benchmark_empty.py)

4. Memory Usage Profiling (bench/run_mem)

5. Custom Serialization Benchmark (bench/run_default)

6. Utility Functions (bench/util.py)

7. Library and Fixture Definitions (bench/data.py)

8. Additional Analytical and Benchmarking Scripts (script/ directory)