pynonstr


Overview

The `pynonstr` script is a standalone Python benchmarking utility focused on comparing the performance of JSON serialization across two libraries: the standard Python `json` module and the high-performance `orjson` library. It specifically tests serialization of large datasets containing mixed dictionary keys, including both integer and string keys, which is a common challenge for JSON serializers.

The script generates synthetic date-indexed data spanning 100 years (1920–2019), each year represented by a dictionary mapping timestamps (integer keys) and a string key `"other"` to integer values. It benchmarks serialization latency for:

The benchmarking results are then formatted into a human-readable table using the `tabulate` library.

This file is designed to run as a standalone program and does not define reusable classes or functions outside its main benchmarking flow.


Detailed Explanation of Key Components

Global Constants and Variables


Affinity Setting

os.sched_setaffinity(os.getpid(), {0, 1})

Sets CPU affinity for the current process to cores 0 and 1 to reduce variability in timing measurements due to OS scheduling.


Data Generation Logic

for year in range(1920, 2020):
    start = datetime.date(year, 1, 1)
    array = [
        (int(mktime((start + datetime.timedelta(days=i)).timetuple())), i + 1)
        for i in range(365)
    ]
    array.append(("other", 0))
    random.shuffle(array)
    data_as_obj.append(dict(array))

This simulates large JSON-like data structures with mixed key types and unordered keys.


Serialization and Timing Utilities

per_iter_latency(val)

def per_iter_latency(val):
    if val is None:
        return None
    return (val * 1000) / ITERATIONS

test_correctness(serialized)

def test_correctness(serialized):
    return orjson.loads(serialized) == data_as_str

Benchmark Execution Loop

The script iterates over each library in `LIBRARIES`:

Each timing is measured with `timeit` over `ITERATIONS` (500) runs.

The latencies are converted to milliseconds per iteration and stored in a results table.


Output

Library

str keys (ms)

int keys (ms)

int keys sorted (ms)

orjson

...

...

...

json

...

...


Usage Example

Run the script directly from the command line:

./pynonstr

Expected output includes:


Important Implementation Details and Algorithms


Interaction with Other Parts of the System


Mermaid Class/Component Diagram

Since this file is a standalone utility script without defined classes or components, a **flowchart** representing the main functions and their relationships is most appropriate:

flowchart TD
    A[Generate Data (data_as_obj)] --> B[Shuffle Keys and Create Dict]
    B --> C[Convert to JSON Object with String Keys (data_as_str)]
    C --> D[Set CPU Affinity to cores 0 & 1]
    D --> E[Measure Serialized Size (orjson)]
    E --> F[Benchmark Loop Over Libraries]
    F --> G{Library == "json"?}
    G -- Yes --> H[Time json.dumps on str keys and int keys]
    G -- No --> I[Time orjson.dumps with OPT_NON_STR_KEYS and OPT_SORT_KEYS]
    H & I --> J[Test Correctness (orjson only)]
    J --> K[Calculate Per-Iteration Latency]
    K --> L[Store Results in Table]
    L --> M[Print Tabulated Benchmark Report]

Summary

`pynonstr` is a focused Python benchmark script designed to evaluate JSON serialization speed and correctness, especially handling mixed key types and sorting, comparing the high-performance `orjson` library against the standard Python `json` module. It uses synthetic date-keyed data, CPU affinity settings, and repeated timing measures to provide reliable latency metrics, outputting a formatted comparison table. This script complements a larger benchmarking project aimed at validating and showcasing orjson’s performance advantages in realistic serialization scenarios.