pyindent
Overview
The `pyindent` script is a standalone Python benchmarking utility designed to measure and compare the serialization performance of JSON encoding libraries, specifically `orjson` and Python's built-in `json` module. It operates on compressed JSON fixture files, decompressing and loading them into Python objects, then timing how long each library takes to serialize the data in both compact and pretty-printed (indented) formats.
This script outputs a formatted comparison table showing the time taken (in milliseconds) per iteration for each library and option, along with relative performance ratios compared to `orjson`.
**Key features:**
Reads JSON fixture files from a predefined data directory.
Supports loading compressed
.xzJSON files.Benchmarks serialization with both compact and pretty-print options.
Validates serialization correctness by deserializing the output and comparing it with the original data.
Uses CPU affinity to restrict execution to specific cores for consistent benchmarking.
Outputs benchmark results in a human-readable GitHub-flavored markdown table.
Detailed Explanation of Components
Global Variables
dirname
Absolute path to thedatadirectory containing JSON fixture files (one directory level above the script directory).filename
The JSON fixture filename passed as the first command-line argument (without extension). Defaults to empty string if not provided.LIBRARIES
Tuple of library names to benchmark:("orjson", "json").headers
Tuple defining table header columns for output display:("Library", "compact (ms)", "pretty (ms)", "vs. orjson").ITERATIONS
Number of iterations to run each serialization test, calculated to ensure a minimum total runtime of approximately 2 seconds fororjsoncompact serialization for statistical reliability.
Functions
read_fixture_obj(filename) -> object
Reads a JSON fixture file (possibly compressed) from the `data` directory and returns the deserialized Python object.
Parameters:
filename(str): The name of the fixture file (e.g.,"canada.json.xz").
Returns:
Deserialized Python object representing the JSON content.
Details:
If the file extension is
.xz, decompresses usinglzmabefore loading.Uses orjson.loads for fast JSON parsing.
Example Usage:
data = read_fixture_obj("canada.json.xz")
per_iter_latency(val) -> float | None
Converts a total elapsed time value into a per-iteration latency in milliseconds.
Parameters:
val(float or None): Total time in seconds for all iterations.
Returns:
Per-iteration latency in milliseconds (
(val * 1000) / ITERATIONS).Returns
Noneif input isNone.
Example Usage:
total_time = 1.5 # seconds for all iterations latency_ms = per_iter_latency(total_time) # e.g., 1.5 * 1000 / ITERATIONS
test_correctness(serialized: bytes) -> bool
Checks if a serialized JSON byte string correctly represents the original `data` object by deserializing and comparing.
Parameters:
serialized(bytes): Serialized JSON bytes.
Returns:
Trueif deserialized output equals the originaldataobject, elseFalse.
Example Usage:
serialized = orjson.dumps(data) is_correct = test_correctness(serialized)
Main Execution Flow
Set CPU affinity:
Limits the process to run on CPU cores 0 and 1 for more consistent timing.Load JSON fixture:
Reads the JSON data object from the specified fixture file (expectsfilename.json.xz).Calculate output sizes:
Measures the size (in KiB) oforjsonserialized output in both compact and pretty formats.Determine iterations:
Runs a quick timeit benchmark onorjson.dumps(data)to compute how many iterations to perform to achieve roughly 2 seconds total runtime.Benchmark loop:
For each library inLIBRARIES:Measures serialization time for compact and pretty-printed JSON.
Validates correctness by deserializing the pretty output.
Calculates per-iteration latency.
Computes a ratio comparing the library's pretty serialization time to
orjson's.
Output results:
Prints a GitHub markdown table with benchmark results including time in milliseconds and relative performance.
Usage Example
Assuming the script is named `pyindent` and a JSON fixture file `example.json.xz` exists in the data directory, run:
./pyindent example
Sample output:
150KiB compact, 180KiB pretty, 100 iterations
orjson...
json...
| Library | compact (ms) | pretty (ms) | vs. orjson |
|---------|--------------|-------------|------------|
| orjson | 15.23 | 18.45 | 1.0 |
| json | 120.55 | 150.32 | 8.1 |
Important Implementation Details
CPU Affinity:
The script sets CPU affinity (os.sched_setaffinity) to cores{0,1}to reduce noise from OS scheduling and improve reproducibility.Time Measurement:
Uses Python'stimeitfunction with a lambda that performs repeated serialization calls to average out timing variability.Dynamic Iteration Count:
The number of iterations is dynamically calculated based on a short initial test to ensure the benchmark runs long enough (minimum ~2 seconds fororjsoncompact serialization).Correctness Validation:
After serialization, re-parsing the output bytes ensures no data corruption or serialization bugs.Compression Support:
Supports.xzcompressed JSON fixtures transparently vialzma.Output Formatting:
Usestabulateto produce clean GitHub-style markdown tables for easy reading and integration into reports or markdown documents.
Interaction with Other System Parts
Data Directory:
Reads input JSON fixture files from thedatadirectory located one level up relative to the script. These fixtures are shared with other benchmarking scripts and utilities in the project.orjson Library:
Usesorjsonfor both parsing fixtures and benchmarking serialization, highlighting its performance advantages over the standardjsonmodule.tabulateLibrary:
Used for formatting output tables.Benchmarking Ecosystem:
The script complements other benchmark scripts in the project (e.g.,bench/benchmark_dumps.py) by providing a quick, focused serialization benchmark with detailed timing and correctness checks.
Visual Diagram: Flowchart of pyindent Script Execution
flowchart TD
A[Start: Parse command-line filename] --> B[Read compressed JSON fixture]
B --> C[Decompress if .xz, then parse JSON]
C --> D[Calculate output sizes using orjson.dumps]
D --> E[Determine ITERATIONS via timeit on orjson.dumps]
E --> F{For each library in (orjson, json)}
F --> G[Time compact serialization]
F --> H[Time pretty serialization]
G --> I[Test correctness by deserializing pretty output]
H --> I
I --> J[Calculate per-iteration latencies]
J --> K[Compute relative performance vs. orjson]
K --> L[Collect results in table]
L --> M[Print formatted markdown table]
M --> N[End]
Summary
The `pyindent` script is a targeted benchmark tool for measuring JSON serialization speed and correctness between `orjson` and Python’s built-in `json` library using compressed JSON fixtures. Its design ensures precise and reproducible performance metrics with dynamic iteration scaling and correctness verification, making it a valuable utility within the broader benchmarking suite of the project.