pydataclass


Overview

The `pydataclass` script is a Python benchmarking utility designed to measure and compare the serialization performance of Python dataclasses versus dictionaries using two JSON libraries: the standard `json` module and the high-performance Rust-backed `orjson` library. It benchmarks the time taken to serialize a large collection of complex nested objects, represented both as Python dataclasses and as dictionaries, repeatedly over multiple iterations.

The key goals of this script are:

The script outputs the serialized data size in KiB and presents a table summarizing the serialization latency (in milliseconds) for each library and data format, along with relative performance comparisons against `orjson`.


Detailed Explanation

Imports and Setup

os.sched_setaffinity(os.getpid(), {0, 1})

*This restricts the process to CPU cores 0 and 1 to reduce variability in timing results.*


Classes

Member

@dataclasses.dataclass
class Member:
    id: int
    active: bool

Object

@dataclasses.dataclass
class Object:
    id: int
    name: str
    members: list[Member]

Data Preparation

objects_as_dataclass = [
    Object(i, str(i) * 3, [Member(j, True) for j in range(10)])
    for i in range(100000, 102000)
]
objects_as_dict = [dataclasses.asdict(each) for each in objects_as_dataclass]

Serialization Size Output

The script first outputs the size of the serialized data (in KiB) when serialized using `orjson` on the dictionary representation:

output_in_kib = len(orjson.dumps(objects_as_dict)) / 1024
print(f"{output_in_kib:,.0f}KiB output (orjson)")

This gives a rough sense of the serialized JSON payload size for the benchmarked data.


Custom Default Function for json.dumps

def default(__obj):
    if dataclasses.is_dataclass(__obj):
        return dataclasses.asdict(__obj)

Benchmark Setup

headers = ("Library", "dict (ms)", "dataclass (ms)", "vs. orjson")
LIBRARIES = ("orjson", "json")
ITERATIONS = 100

Utility Function: per_iter_latency

def per_iter_latency(val):
    if val is None:
        return None
    return (val * 1000) / ITERATIONS

Benchmarking Loop

For each library in `LIBRARIES`, the script measures serialization latency for:

For json:

For orjson:

Benchmark times are measured with `timeit.timeit` over `ITERATIONS`.


Relative Performance Calculation


Results Table Formatting and Output

Example output table:

Library

dict (ms)

dataclass (ms)

vs. orjson

orjson

12.34

15.67

1

json

234.56

345.67

22


Usage Example

Run the script directly:

./pydataclass

Expected output (example):

134KiB output (orjson)
| Library | dict (ms) | dataclass (ms) | vs. orjson |
|---------|-----------|----------------|------------|
| orjson  | 12.34     | 15.67          | 1          |
| json    | 234.56    | 345.67         | 22         |

Important Implementation Details


Interaction with Other System Components


Mermaid Diagram: Class Structure and Serialization Flow

classDiagram
    class Member {
        +int id
        +bool active
    }
    class Object {
        +int id
        +str name
        +list~Member~ members
    }
    Member <|-- Object : contains
    class pydataclass_script {
        +objects_as_dataclass: list~Object~
        +objects_as_dict: list~dict~
        +default(obj): dict
        +per_iter_latency(val): float
        +main()
    }

Summary

The `pydataclass` script benchmarks JSON serialization performance between dataclasses and dictionaries across two popular JSON libraries (`orjson` and `json`). It highlights the performance benefits of using `orjson`'s native dataclass serialization support and provides a clear, tabulated comparison of serialization latencies. The script is a practical tool to understand serialization overheads when using Python dataclasses and to demonstrate the efficiency of Rust-backed JSON libraries in real-world, nested data scenarios.