pysort
Overview
The file **[pysort](/projects/287/67683)** is a Python script designed to benchmark and compare the performance of JSON serialization (encoding) using two popular JSON libraries: Python's built-in `json` module and the high-performance Rust-backed `orjson` library. It focuses on timing the serialization process of a large JSON dataset in both unsorted and sorted key forms, measuring the time taken per iteration and comparing each library's performance relative to `orjson`.
This script is primarily used for performance testing and reporting, generating a tabulated summary of serialization latencies in milliseconds, helping developers understand the speed differences and efficiency gains offered by `orjson` compared to the standard library.
Detailed Explanation
Global Constants and Variables
dirname
Type:str
Description: Directory path pointing to thedatafolder, located one level up from the script's directory. This folder contains JSON fixture files used for benchmarking.headers
Type:Tuple[str, str, str, str]
Description: Column headers for the summary table output. Columns are:Library name
Time taken to serialize unsorted JSON (in ms)
Time taken to serialize sorted JSON (in ms)
Performance ratio vs.
orjsonsorted serialization time
LIBRARIES
Type:Tuple[str, ...]
Description: Tuple of library names to benchmark. Currently supports"orjson"and"json".ITERATIONS
Type:int
Description: Number of times each serialization operation is repeated to obtain stable timing results.data
Type:dict(loaded JSON object)
Description: The JSON data loaded from the compressed fixture file"twitter.json.xz"used as the benchmark input.
Functions
read_fixture_obj(filename)
def read_fixture_obj(filename) -> dict:
Purpose:
Load and parse a JSON fixture file into a Python dictionary. Supports transparent decompression if the file is.xzcompressed.Parameters:
filename(str): Name of the fixture file to read (e.g.,"twitter.json.xz").
Returns:
dict: A Python object parsed from the JSON contents of the file.
Implementation Details:
Constructs a full path to the fixture file inside the
datadirectory.If the file has
.xzsuffix, decompresses it usinglzma.decompress.Reads the raw bytes otherwise.
Parses JSON content using
orjson.loadsfor high-performance deserialization.
Usage Example:
data = read_fixture_obj("twitter.json.xz")
per_iter_latency(val)
def per_iter_latency(val: float) -> float | None:
Purpose:
Converts a total elapsed time for multiple iterations into a per-iteration latency in milliseconds.Parameters:
val(floatorNone): The total elapsed time in seconds for all iterations.
Returns:
floatorNone: Time per iteration in milliseconds (ms), orNoneif input isNone.
Implementation Details:
Multiplies the total time by 1000 to convert seconds to milliseconds.
Divides by the global
ITERATIONSconstant to get per iteration time.
Usage Example:
total_time = 0.5 # seconds for 500 iterations
latency_ms = per_iter_latency(total_time) # returns 1.0 ms per iteration
Main Benchmarking Logic
The script performs the following steps:
Set CPU Affinity:
Restricts the process to CPU cores 0 and 1 usingos.sched_setaffinityto reduce variability in timing results due to CPU scheduling.Load Benchmark Data:
Reads and decompresses the"twitter.json.xz"JSON fixture into a Python objectdata.Benchmark Loop Over Libraries:
Iterates over each library inLIBRARIESand performs two benchmarks per library:Unsorted Serialization: Serialize
datawithout sorting keys.Sorted Serialization: Serialize
datawith keys sorted.
Timing Using
timeit.timeit:
Measures the total time taken to serialize the JSON dataITERATIONStimes.Latency Calculation:
Converts total times into per-iteration latencies (in milliseconds).Performance Comparison:
Computes the ratio of each library's sorted serialization time relative toorjson's sorted serialization time.Tabulated Output:
Usestabulateto format and print the results as a Markdown-style table.
Library-Specific Notes
jsonlibrary:
Usesjson.dumpsfollowed by UTF-8 encoding (encode("utf-8")) to produce bytes, matchingorjson.dumpsoutput type.orjsonlibrary:
Usesorjson.dumps. For sorted keys, passes theOPT_SORT_KEYSoption.Error Handling:
RaisesNotImplementedErrorif an unsupported library name is specified.
Example Output
| Library | unsorted (ms) | sorted (ms) | vs. orjson |
|---------|---------------|-------------|------------|
| orjson | 1.23 | 1.35 | 1.0 |
| json | 15.67 | 17.89 | 13.3 |
Important Implementation Details
CPU Affinity:
By pinning the process to specific CPUs, the script reduces noise in timing results caused by CPU migration or interference from other system processes.Compressed Fixture Loading:
The use of.xzcompression for the JSON fixture reduces disk I/O time and storage space, whileorjsonprovides fast decompression and deserialization.Multiple Iterations for Timing:
Running serialization multiple times (ITERATIONS = 500) ensures that timing measurements are statistically significant and less susceptible to transient system load fluctuations.Latency Normalization:
Reporting per-iteration latency in milliseconds makes the results more interpretable and comparable across different hardware and environments.
Interactions with Other Parts of the System
Data Fixtures:
The script depends on the presence of compressed JSON fixtures in thedatadirectory relative to the project root. These fixtures are shared resources used across various benchmarking scripts.Benchmarking Framework:
Although this script is standalone, it complements other benchmarking scripts in the project (such asbench/benchmark_dumps.py) that perform more detailed and automated benchmarking usingpytestandpytest-benchmark.JSON Libraries:
The script comparesorjson(the project's Rust-backed JSON library) against Python's standardjsonmodule, providing direct performance comparisons that validate orjson's efficiency claims.Result Reporting:
Uses the third-partytabulatepackage to produce human-readable tables, which can be further consumed by other reporting or visualization tools.
Mermaid Diagram: Flowchart of Main Functions and Workflow
flowchart TD
A[Start Script] --> B[Set CPU Affinity to cores {0,1}]
B --> C[Load JSON Fixture "twitter.json.xz"]
C --> D{For each library in LIBRARIES}
D --> E1[If library == "json"]
D --> E2[If library == "orjson"]
E1 --> F1[Time json.dumps (unsorted & sorted)]
E2 --> F2[Time orjson.dumps (unsorted & sorted with OPT_SORT_KEYS)]
F1 --> G[Calculate per-iteration latency]
F2 --> G
G --> H[Calculate performance ratio vs orjson]
H --> I[Append results to table]
I --> D
D --> J[Format table with tabulate]
J --> K[Print results]
K --> L[End Script]
Summary
The [pysort](/projects/287/67683) script provides a concise and practical benchmarking tool focused on evaluating the serialization speed of JSON data between Python's built-in `json` library and the `orjson` library. It emphasizes consistent measurement through multiple iterations and CPU affinity management, producing clear tabular results that facilitate performance comparison and decision-making regarding JSON serialization strategies in Python applications.