benchmark_loads.py
Overview
The `benchmark_loads.py` file is a benchmarking test script that measures the **deserialization** (JSON parsing) performance and correctness of multiple JSON libraries against a set of real-world JSON test fixtures. It leverages the `pytest` framework along with the `pytest-benchmark` plugin to run repeatable and automated performance tests.
This file focuses exclusively on the "load" operation — converting raw JSON byte streams into Python objects — by reading compressed fixture files, deserializing them using different libraries, verifying correctness, and timing these operations. It helps to evaluate how fast and accurate various JSON parsers are when handling complex and large JSON inputs.
Detailed Explanation
Imports
json.loads(aliased asjson_loads): The standard Python JSON parser used here primarily for correctness verification.pytest: The testing framework used for parametric tests and integration with benchmarking.Local imports from the same package:
fixtures, libraries from
.data: Lists of JSON fixture names and JSON libraries to test.read_fixturefrom.util: A utility function to read compressed fixture files into bytes.
Main Function: test_loads
@pytest.mark.parametrize("fixture", fixtures)
@pytest.mark.parametrize("library", libraries)
def test_loads(benchmark, fixture, library):
dumper, loader = libraries[library]
benchmark.group = f"{fixture} deserialization"
benchmark.extra_info["lib"] = library
data = read_fixture(f"{fixture}.xz")
correct = json_loads(dumper(loader(data))) == json_loads(data) # type: ignore
benchmark.extra_info["correct"] = correct
benchmark(loader, data)
Purpose
Benchmarks the deserialization speed of the given
libraryon the specified JSONfixture.Validates correctness by comparing a "round-trip" serialization after deserialization with the original JSON data.
Records benchmark metadata such as library name, fixture name, and correctness status.
Parameters
benchmark: Provided bypytest-benchmark, used to measure time and record benchmark metadata.fixture(str): The name of a JSON fixture file (without extension) to load and parse.library(str): The key/name of the JSON library to benchmark (e.g.,"orjson","json").
Local Variables
dumper, loader: Tuple of functions for serialization (dumper) and deserialization (loader) from the selected JSON library, obtained fromlibraries.benchmark.group: Groups benchmarks by fixture name for easier result organization.benchmark.extra_info: A dict storing contextual info like library name and correctness, included in the benchmark report.data: Raw bytes read from the compressed fixture file (e.g.,"canada.json.xz").correct: Boolean indicating whether the deserialized-then-serialized data matches the original JSON string.
Workflow
Retrieve the pair of functions
(dumper, loader)for the selected JSON library.Load the compressed JSON test fixture from disk using
read_fixture.Verify correctness by:
Deserializing the raw data (
loader(data)) into a Python object.Serializing the Python object back to JSON bytes (
dumper(...)).Parsing both original and round-tripped JSON bytes with Python's standard
json.loadsto obtain Python dicts.Comparing these dicts for equality.
Store correctness result in
benchmark.extra_info["correct"].Run the benchmark timing for the
loaderfunction on the inputdata.
Return Value
None (this is a test function invoked by
pytest).Benchmark results and correctness info are recorded and reported by pytest-benchmark.
Usage Example
To run the deserialization benchmarks manually, assuming a pytest environment and installed `pytest-benchmark` plugin, execute:
pytest benchmark_loads.py --benchmark-only
This will run all combinations of fixtures and libraries, measuring deserialization speed and correctness.
Important Implementation Details
Parametrization: The use of
pytest.mark.parametrizeover bothfixturesandlibrariesgenerates a Cartesian product of tests, ensuring every fixture is tested with every JSON library.Correctness Check: The correctness validation uses a "round-trip" approach by first deserializing, then serializing, and comparing results through Python's standard
json.loads. This guards against silent parsing errors or data corruption.Benchmarking Integration: The
benchmarkfixture frompytest-benchmarkis used to measure execution time precisely and to attach contextual metadata for reporting.Data Loading: The function
read_fixturereads the fixture files that are compressed with.xz, allowing the benchmarks to work with large realistic datasets efficiently without storing them uncompressed.Type Ignore Comment: The
# type: ignorecomment on the correctness check line suppresses type errors, likely due to the dynamic nature of the dumper and loader functions or the complexity of the comparison.
Interaction With Other System Components
Fixtures and Libraries (
.datamodule):Supplies the list of JSON fixtures (e.g.,
"canada","github") and the mapping of library names to their(dumper, loader)functions.
Utility Function (
.utilmodule):read_fixturereads compressed fixture files from the data directory.
Benchmarking Framework:
Relies on
pytestandpytest-benchmarkto execute, time, and report on the tests.
Serialization Benchmarks (
benchmark_dumps.py):Complements this deserialization benchmark by measuring serialization speed.
Memory Profiling and Other Benchmarks:
Memory usage and edge-case benchmarks are handled in separate scripts but share fixtures and libraries.
Benchmark Reports:
Results feed into performance reports and visualizations to compare JSON libraries.
Mermaid Diagram: Flowchart of benchmark_loads.py Workflow
flowchart TD
A[Start: pytest runs test_loads] --> B[Load JSON fixture bytes]
B --> C[Select JSON library dumper & loader]
C --> D[Deserialize JSON bytes using loader]
D --> E[Serialize Python object back to JSON bytes using dumper]
E --> F[Parse original and round-trip JSON with standard json.loads]
F --> G{Are parsed objects equal?}
G -->|Yes| H[Set benchmark.extra_info["correct"] = True]
G -->|No| I[Set benchmark.extra_info["correct"] = False]
H --> J[Run benchmark timing on loader function]
I --> J
J --> K[Report benchmark metrics & correctness]
Summary
The `benchmark_loads.py` file is a concise yet critical component of the JSON benchmarking suite focused on deserialization:
It systematically tests multiple JSON fixtures and libraries.
Measures deserialization speed under controlled conditions.
Validates correctness via round-trip serialization checks.
Integrates tightly with other benchmark scripts and utility modules.
Uses
pytestandpytest-benchmarkfor structured, repeatable performance testing.
This file helps verify the parsing efficiency and reliability of JSON libraries, informing users and developers about their suitability for various real-world JSON workloads.