pyindent


Overview

The `pyindent` script is a standalone Python benchmarking utility designed to measure and compare the serialization performance of JSON encoding libraries, specifically `orjson` and Python's built-in `json` module. It operates on compressed JSON fixture files, decompressing and loading them into Python objects, then timing how long each library takes to serialize the data in both compact and pretty-printed (indented) formats.

This script outputs a formatted comparison table showing the time taken (in milliseconds) per iteration for each library and option, along with relative performance ratios compared to `orjson`.

**Key features:**


Detailed Explanation of Components

Global Variables


Functions

read_fixture_obj(filename) -> object

Reads a JSON fixture file (possibly compressed) from the `data` directory and returns the deserialized Python object.


per_iter_latency(val) -> float | None

Converts a total elapsed time value into a per-iteration latency in milliseconds.


test_correctness(serialized: bytes) -> bool

Checks if a serialized JSON byte string correctly represents the original `data` object by deserializing and comparing.


Main Execution Flow

  1. Set CPU affinity:
    Limits the process to run on CPU cores 0 and 1 for more consistent timing.

  2. Load JSON fixture:
    Reads the JSON data object from the specified fixture file (expects filename.json.xz).

  3. Calculate output sizes:
    Measures the size (in KiB) of orjson serialized output in both compact and pretty formats.

  4. Determine iterations:
    Runs a quick timeit benchmark on orjson.dumps(data) to compute how many iterations to perform to achieve roughly 2 seconds total runtime.

  5. Benchmark loop:
    For each library in LIBRARIES:

    • Measures serialization time for compact and pretty-printed JSON.

    • Validates correctness by deserializing the pretty output.

    • Calculates per-iteration latency.

    • Computes a ratio comparing the library's pretty serialization time to orjson's.

  6. Output results:
    Prints a GitHub markdown table with benchmark results including time in milliseconds and relative performance.


Usage Example

Assuming the script is named `pyindent` and a JSON fixture file `example.json.xz` exists in the data directory, run:

./pyindent example

Sample output:

150KiB compact, 180KiB pretty, 100 iterations
orjson...
json...

| Library | compact (ms) | pretty (ms) | vs. orjson |
|---------|--------------|-------------|------------|
| orjson  | 15.23        | 18.45       | 1.0        |
| json    | 120.55       | 150.32      | 8.1        |

Important Implementation Details


Interaction with Other System Parts


Visual Diagram: Flowchart of pyindent Script Execution

flowchart TD
    A[Start: Parse command-line filename] --> B[Read compressed JSON fixture]
    B --> C[Decompress if .xz, then parse JSON]
    C --> D[Calculate output sizes using orjson.dumps]
    D --> E[Determine ITERATIONS via timeit on orjson.dumps]
    E --> F{For each library in (orjson, json)}
    F --> G[Time compact serialization]
    F --> H[Time pretty serialization]
    G --> I[Test correctness by deserializing pretty output]
    H --> I
    I --> J[Calculate per-iteration latencies]
    J --> K[Compute relative performance vs. orjson]
    K --> L[Collect results in table]
    L --> M[Print formatted markdown table]
    M --> N[End]

Summary

The `pyindent` script is a targeted benchmark tool for measuring JSON serialization speed and correctness between `orjson` and Python’s built-in `json` library using compressed JSON fixtures. Its design ensures precise and reproducible performance metrics with dynamic iteration scaling and correctness verification, making it a valuable utility within the broader benchmarking suite of the project.