run_default
Overview
The `run_default` file is a lightweight Python benchmarking script designed to test the performance of serializing complex Python objects with the **orjson** library, specifically focusing on the behavior when using a custom fallback serialization function (`default`) and the option to serialize NumPy data types (`OPT_SERIALIZE_NUMPY`). It simulates a realistic scenario where large nested Python objects contain instances of user-defined classes that are not natively serializable, necessitating a fallback mechanism.
The script configures CPU affinity for consistent benchmarking, constructs a nested list of custom objects, and repeatedly serializes this structure using orjson's `dumps` method with custom fallback and NumPy serialization options enabled. This benchmarking helps measure the overhead and efficiency of fallback serialization in orjson.
File Purpose and Functionality
Purpose: Benchmark the serialization speed of large, complex Python objects containing unsupported types using orjson’s fallback serialization mechanism.
Functionality:
Sets CPU affinity to cores 0 and 1 to reduce variability in timing measurements.
Defines a trivial
Customclass whose instances are not natively serializable.Implements a
defaultfunction returningNoneto convert unsupported objects into JSONnull.Constructs a nested list
objcontaining thousands ofCustominstances.Executes the serialization
ntimes (default 10,000 or user-supplied via command line argument).Uses orjson’s
dumpswith the fallback andOPT_SERIALIZE_NUMPYoption to serialize the object.
Detailed Explanation of Components
Imports and CPU Affinity Setup
import sys
import os
os.sched_setaffinity(os.getpid(), {0, 1})
sys: For command-line argument parsing.os: Used to set CPU affinity to cores 0 and 1 usingos.sched_setaffinity, which confines the script execution to specific CPU cores, reducing timing noise and improving benchmark consistency.
Class: Custom
class Custom:
pass
A minimal user-defined class with no attributes or methods.
Instances of
Customare not natively serializable by orjson or standard JSON serializers.Acts as the "unsupported" type in the serialization benchmark.
Function: default
def default(_):
return None
A fallback function passed to orjson's
dumpsto handle unsupported objects.Takes a single parameter (the unsupported object) but ignores it.
Returns
None, which orjson serializes as JSONnull.Prevents serialization errors by substituting unsupported objects with
nullin the output JSON.
**Usage Example:**
from orjson import dumps, OPT_SERIALIZE_NUMPY
class Custom:
pass
def default(_):
return None
obj = [[Custom()] * 1000] * 10
json_bytes = dumps(obj, default, OPT_SERIALIZE_NUMPY)
print(json_bytes) # Output includes 'null' in place of Custom instances
Main Benchmark Logic
n = int(sys.argv[1]) if len(sys.argv) >= 2 else 10000
obj = [[Custom()] * 1000] * 10
for _ in range(n):
dumps(obj, default, OPT_SERIALIZE_NUMPY)
Reads the number of iterations
nfrom the first command-line argument; defaults to 10,000 if not provided.Constructs
obj, a nested list with 10 sublists, each containing 1000 references to the sameCustominstance.Note: Due to multiplication, each sublist contains references to the same object repeated 1000 times.
Runs a tight loop serializing
objntimes using:dumpsfrom orjson.The
defaultfallback function to handle unsupportedCustominstances.The option
OPT_SERIALIZE_NUMPY(though no NumPy arrays are present, this option is included to benchmark overhead).
The loop measures serialization throughput implicitly (timing is expected to be collected externally or by the user).
Important Implementation Details
CPU Affinity: Setting CPU affinity to cores 0 and 1 is a technique to improve benchmarking reliability by minimizing OS scheduling variability.
Fallback Function: Returning
Nonefromdefaultconverts unsupported Python objects into JSONnull, preventing exceptions during serialization.Use of
OPT_SERIALIZE_NUMPY: Although the object contains no NumPy arrays, this flag is passed to assess any serialization overhead or interaction with the fallback mechanism.Object List Construction: The nested list contains repeated references to the same
Customobject, which tests serialization of repeated unsupported objects efficiently.No Output or Timing in Script: The script does not print or return timing results; it is expected to be run with external timing tools or profilers.
Interaction with Other System Components
orjson Library: This script directly uses orjson's
dumpsserialization function with advanced options to test fallback serialization and NumPy support.Benchmarking Framework: This file complements the broader benchmarking suite by providing a focused test case for fallback serialization performance.
CPU Affinity Control: The affinity setting aligns with other benchmark scripts that fix CPU cores for consistent results.
Custom Serialization Support: This script exemplifies the use of a fallback serialization function, a key feature tested and documented in the project.
Potential Integration: Timing and profiling results from running this script can be combined with other benchmark outputs to analyze orjson's performance characteristics under fallback serialization scenarios.
Usage
Run the script from the command line, optionally specifying the number of iterations:
python3 run_default 5000
This runs the serialization benchmark 5000 times. Without arguments, it defaults to 10,000 iterations.
Example Output (Conceptual)
The script itself produces no output, but when run under a timing tool such as `time` or a profiler, you might see:
real 0m1.234s
user 0m1.200s
sys 0m0.020s
Indicating the time taken to perform 10,000 serializations with fallback.
Mermaid Diagram: Structure of run_default
flowchart TD
A[Set CPU Affinity to cores 0,1]
B[Define class Custom (unsupported type)]
C[Define fallback function default -> returns None]
D[Create nested list obj with Custom instances]
E[Parse iteration count n from argv (default 10000)]
F[Loop n times]
G[Call orjson.dumps(obj, default, OPT_SERIALIZE_NUMPY)]
A --> B --> C --> D --> E --> F --> G
This flowchart illustrates the sequential setup and execution steps in the script.
Summary
The `run_default` script benchmarks orjson’s serialization performance when handling complex Python objects containing unsupported types by leveraging a fallback serialization function. By setting CPU affinity and repeatedly serializing a large nested list of custom objects, it enables precise measurement of the overhead and behavior of orjson’s fallback mechanism combined with NumPy serialization support. This file is an essential part of the benchmarking suite aimed at validating orjson’s extensibility and performance in realistic serialization scenarios involving user-defined Python types.