run_func
Overview
`run_func` is a lightweight Python benchmarking utility script designed to repeatedly serialize or deserialize JSON data using the `orjson` library. It operates on a compressed JSON fixture file (in `.xz` format) and performs either serialization (`dumps`) or deserialization (`loads`) operations multiple times, as specified by the user. The script is optimized for consistent benchmarking by disabling garbage collection and setting CPU affinity to specific cores.
This tool is typically used as part of a broader benchmarking framework to measure throughput and performance of JSON serialization and deserialization, providing a quick, repeatable mechanism to gather timing data under controlled CPU conditions.
File Purpose and Functionality
Purpose: Measure the speed of repeated serialization or deserialization operations on a JSON fixture using
orjson.Input: Command line arguments specifying the JSON fixture file path, operation mode (
dumpsorloads), and optional iteration count.Output: The script itself does not print output but executes the requested operation
ntimes to facilitate external timing or profiling.Optimization: Disables Python garbage collection and pins the process to CPU cores 0 and 1 to reduce variability in benchmarking results.
Detailed Explanation
Script Entry and Parameters
The script expects command line arguments:
./run_func <filename> <operation> [iterations]
filename(str): Path to the.xzcompressed JSON fixture file.operation(str): Either"dumps"to benchmark serialization or"loads"to benchmark deserialization.iterations(int, optional): Number of times to repeat the operation. Defaults to 1000 if omitted.
Example:
./run_func data/github.json.xz dumps 5000
Internal Workflow
Disable Garbage Collection
gc.disable()To avoid GC overhead affecting timing measurements.
Set CPU Affinity
os.sched_setaffinity(os.getpid(), {0, 1})Restricts the process to CPU cores 0 and 1 for consistent performance.
Load and Decompress Fixture
with lzma.open(filename, "r") as fileh: file_bytes = fileh.read()Reads the entire compressed file content into memory as bytes.
Branch by Operation
If operation is
"dumps":Deserialize bytes into a Python object once:
file_obj = loads(file_bytes)Run repeated serialization (dump) of the in-memory object:
for _ in range(n): dumps(file_obj)
If operation is
"loads":Run repeated deserialization (load) of the raw bytes:
for _ in range(n): loads(file_bytes)
Functions and Methods
This script does not define any classes or functions but performs all operations at the top-level procedural code scope.
Implementation Details and Algorithms
Compression Handling: Uses
lzma.opento transparently decompress.xzfiles, a common format for large JSON fixtures in the benchmarking suite.CPU Affinity: The use of
os.sched_setaffinityensures that the process runs on a fixed subset of CPU cores, reducing noise from OS scheduling and improving reproducibility.Garbage Collection Disabled: Prevents Python’s GC from running during the benchmark loop, eliminating unpredictable pauses.
Repeated Calls for Benchmarking: Runs the target operation in a tight loop
ntimes with no intermediate output, designed to be timed externally (e.g., withtimeshell command or Python profiling tools).
Usage Example
Suppose you have a compressed fixture file `data/github.json.xz` and want to benchmark serialization performance over 2000 iterations:
./run_func data/github.json.xz dumps 2000
To benchmark deserialization 1000 times (default):
./run_func data/github.json.xz loads
Interaction with Other System Components
JSON Fixtures: Reads JSON fixture files compressed in
.xzformat, which are shared across the benchmarking framework.orjson Library: Relies exclusively on
orjsonfor serialization (dumps) and deserialization (loads).Benchmarking Framework: Acts as a utility to perform raw repeated JSON operations, often called by higher-level scripts or shell commands to gather timing or memory usage statistics.
CPU Affinity and GC Settings: Consistent with other benchmarking scripts in the suite that fix CPU affinity and disable GC for comparable results.
Complement to Other Benchmarks: Provides a simple, isolated benchmarking tool for serialization or deserialization throughput, complementing more complex pytest-based benchmarks and memory profiling scripts.
Mermaid Diagram: Flowchart of run_func Workflow
flowchart TD
A[Start] --> B[Parse Command Line Arguments]
B --> C[Disable Garbage Collection]
C --> D[Set CPU Affinity to cores 0 and 1]
D --> E[Open and Read .xz Compressed JSON Fixture]
E --> F{Operation?}
F -->|dumps| G[Deserialize bytes into Python object]
G --> H[Repeat n times: Serialize object with orjson.dumps]
F -->|loads| I[Repeat n times: Deserialize bytes with orjson.loads]
H --> J[End]
I --> J
Summary
`run_func` is a minimal, specialized benchmarking script designed to repeatedly serialize or deserialize JSON data from a compressed fixture file using `orjson`. It is optimized for consistent performance measurement through CPU affinity settings and garbage collection control. This script integrates with the larger benchmarking framework by providing a simple, repeatable workload that can be externally timed or profiled to assess orjson’s throughput characteristics.
Appendix: Key Modules Used
Module | Purpose |
|---|---|
`sys` | Access command line arguments |
`lzma` | Read `.xz` compressed fixture files |
`os` | Set CPU affinity for process |
`gc` | Disable garbage collection |
`orjson` | High-performance JSON serialization library |