pysort


Overview

The file **[pysort](/projects/287/67683)** is a Python script designed to benchmark and compare the performance of JSON serialization (encoding) using two popular JSON libraries: Python's built-in `json` module and the high-performance Rust-backed `orjson` library. It focuses on timing the serialization process of a large JSON dataset in both unsorted and sorted key forms, measuring the time taken per iteration and comparing each library's performance relative to `orjson`.

This script is primarily used for performance testing and reporting, generating a tabulated summary of serialization latencies in milliseconds, helping developers understand the speed differences and efficiency gains offered by `orjson` compared to the standard library.


Detailed Explanation

Global Constants and Variables


Functions

read_fixture_obj(filename)

def read_fixture_obj(filename) -> dict:
data = read_fixture_obj("twitter.json.xz")

per_iter_latency(val)

def per_iter_latency(val: float) -> float | None:
total_time = 0.5  # seconds for 500 iterations
latency_ms = per_iter_latency(total_time)  # returns 1.0 ms per iteration

Main Benchmarking Logic

The script performs the following steps:

  1. Set CPU Affinity:
    Restricts the process to CPU cores 0 and 1 using os.sched_setaffinity to reduce variability in timing results due to CPU scheduling.

  2. Load Benchmark Data:
    Reads and decompresses the "twitter.json.xz" JSON fixture into a Python object data.

  3. Benchmark Loop Over Libraries:
    Iterates over each library in LIBRARIES and performs two benchmarks per library:

    • Unsorted Serialization: Serialize data without sorting keys.

    • Sorted Serialization: Serialize data with keys sorted.

  4. Timing Using timeit.timeit:
    Measures the total time taken to serialize the JSON data ITERATIONS times.

  5. Latency Calculation:
    Converts total times into per-iteration latencies (in milliseconds).

  6. Performance Comparison:
    Computes the ratio of each library's sorted serialization time relative to orjson's sorted serialization time.

  7. Tabulated Output:
    Uses tabulate to format and print the results as a Markdown-style table.


Library-Specific Notes


Example Output

| Library | unsorted (ms) | sorted (ms) | vs. orjson |
|---------|---------------|-------------|------------|
| orjson  | 1.23          | 1.35        | 1.0        |
| json    | 15.67         | 17.89       | 13.3       |

Important Implementation Details


Interactions with Other Parts of the System


Mermaid Diagram: Flowchart of Main Functions and Workflow

flowchart TD
    A[Start Script] --> B[Set CPU Affinity to cores {0,1}]
    B --> C[Load JSON Fixture "twitter.json.xz"]
    C --> D{For each library in LIBRARIES}
    D --> E1[If library == "json"]
    D --> E2[If library == "orjson"]
    E1 --> F1[Time json.dumps (unsorted & sorted)]
    E2 --> F2[Time orjson.dumps (unsorted & sorted with OPT_SORT_KEYS)]
    F1 --> G[Calculate per-iteration latency]
    F2 --> G
    G --> H[Calculate performance ratio vs orjson]
    H --> I[Append results to table]
    I --> D
    D --> J[Format table with tabulate]
    J --> K[Print results]
    K --> L[End Script]

Summary

The [pysort](/projects/287/67683) script provides a concise and practical benchmarking tool focused on evaluating the serialization speed of JSON data between Python's built-in `json` library and the `orjson` library. It emphasizes consistent measurement through multiple iterations and CPU affinity management, producing clear tabular results that facilitate performance comparison and decision-making regarding JSON serialization strategies in Python applications.