run_func

Overview

`run_func` is a lightweight Python benchmarking utility script designed to repeatedly serialize or deserialize JSON data using the `orjson` library. It operates on a compressed JSON fixture file (in `.xz` format) and performs either serialization (`dumps`) or deserialization (`loads`) operations multiple times, as specified by the user. The script is optimized for consistent benchmarking by disabling garbage collection and setting CPU affinity to specific cores.

This tool is typically used as part of a broader benchmarking framework to measure throughput and performance of JSON serialization and deserialization, providing a quick, repeatable mechanism to gather timing data under controlled CPU conditions.

File Purpose and Functionality

Purpose: Measure the speed of repeated serialization or deserialization operations on a JSON fixture using orjson.
Input: Command line arguments specifying the JSON fixture file path, operation mode (dumps or loads), and optional iteration count.
Output: The script itself does not print output but executes the requested operation n times to facilitate external timing or profiling.
Optimization: Disables Python garbage collection and pins the process to CPU cores 0 and 1 to reduce variability in benchmarking results.

Detailed Explanation

Script Entry and Parameters

The script expects command line arguments:

./run_func <filename> <operation> [iterations]

filename (str): Path to the .xz compressed JSON fixture file.
operation (str): Either "dumps" to benchmark serialization or "loads" to benchmark deserialization.
iterations (int, optional): Number of times to repeat the operation. Defaults to 1000 if omitted.

Example:

./run_func data/github.json.xz dumps 5000

Internal Workflow

Disable Garbage Collection
```
gc.disable()
```
To avoid GC overhead affecting timing measurements.
Set CPU Affinity
```
os.sched_setaffinity(os.getpid(), {0, 1})
```
Restricts the process to CPU cores 0 and 1 for consistent performance.
Load and Decompress Fixture
```
with lzma.open(filename, "r") as fileh:
    file_bytes = fileh.read()
```
Reads the entire compressed file content into memory as bytes.
Branch by Operation
- If operation is "dumps":
  - Deserialize bytes into a Python object once:
```
file_obj = loads(file_bytes)
```
  - Run repeated serialization (dump) of the in-memory object:
```
for _ in range(n):
    dumps(file_obj)
```
- If operation is "loads":
  - Run repeated deserialization (load) of the raw bytes:
```
for _ in range(n):
    loads(file_bytes)
```

Functions and Methods

This script does not define any classes or functions but performs all operations at the top-level procedural code scope.

Implementation Details and Algorithms

Compression Handling: Uses lzma.open to transparently decompress .xz files, a common format for large JSON fixtures in the benchmarking suite.
CPU Affinity: The use of os.sched_setaffinity ensures that the process runs on a fixed subset of CPU cores, reducing noise from OS scheduling and improving reproducibility.
Garbage Collection Disabled: Prevents Python’s GC from running during the benchmark loop, eliminating unpredictable pauses.
Repeated Calls for Benchmarking: Runs the target operation in a tight loop n times with no intermediate output, designed to be timed externally (e.g., with time shell command or Python profiling tools).

Usage Example

Suppose you have a compressed fixture file `data/github.json.xz` and want to benchmark serialization performance over 2000 iterations:

./run_func data/github.json.xz dumps 2000

To benchmark deserialization 1000 times (default):

./run_func data/github.json.xz loads

Interaction with Other System Components

JSON Fixtures: Reads JSON fixture files compressed in .xz format, which are shared across the benchmarking framework.
orjson Library: Relies exclusively on orjson for serialization (dumps) and deserialization (loads).
Benchmarking Framework: Acts as a utility to perform raw repeated JSON operations, often called by higher-level scripts or shell commands to gather timing or memory usage statistics.
CPU Affinity and GC Settings: Consistent with other benchmarking scripts in the suite that fix CPU affinity and disable GC for comparable results.
Complement to Other Benchmarks: Provides a simple, isolated benchmarking tool for serialization or deserialization throughput, complementing more complex pytest-based benchmarks and memory profiling scripts.

Mermaid Diagram: Flowchart of run_func Workflow

flowchart TD
    A[Start] --> B[Parse Command Line Arguments]
    B --> C[Disable Garbage Collection]
    C --> D[Set CPU Affinity to cores 0 and 1]
    D --> E[Open and Read .xz Compressed JSON Fixture]
    E --> F{Operation?}
    F -->|dumps| G[Deserialize bytes into Python object]
    G --> H[Repeat n times: Serialize object with orjson.dumps]
    F -->|loads| I[Repeat n times: Deserialize bytes with orjson.loads]
    H --> J[End]
    I --> J

Summary

`run_func` is a minimal, specialized benchmarking script designed to repeatedly serialize or deserialize JSON data from a compressed fixture file using `orjson`. It is optimized for consistent performance measurement through CPU affinity settings and garbage collection control. This script integrates with the larger benchmarking framework by providing a simple, repeatable workload that can be externally timed or profiled to assess orjson’s throughput characteristics.

Appendix: Key Modules Used

Module	Purpose
`sys`	Access command line arguments
`lzma`	Read `.xz` compressed fixture files
`os`	Set CPU affinity for process
`gc`	Disable garbage collection
`orjson`	High-performance JSON serialization library

End of Documentation for `run_func`