util.py
Overview
`util.py` is a utility module designed to support efficient loading and caching of JSON fixture data for benchmarking purposes within the project. It provides functions to read raw fixture files—supporting both plain and `.xz` compressed formats—and to parse these fixtures into Python objects using the high-performance `orjson` library. Additionally, the module attempts to optimize benchmarking consistency by setting the CPU affinity of the running process to a fixed set of cores (if supported by the operating system), reducing runtime variability caused by CPU scheduling.
This file is critical in the benchmarking framework for:
Minimizing redundant I/O and decompression overhead via caching.
Providing fast access to fixtures in bytes or deserialized form.
Reducing noise in benchmark timings through CPU affinity.
Implementation Details
CPU Affinity Setting: Upon import, if the
osmodule supportssched_setaffinity, the process's CPU affinity is restricted to CPU cores 0 and 1. This limits the operating system scheduler to these cores, thereby aiming for consistent performance measurements during benchmarking runs.Fixture Directory: The
dirnamevariable points to the relative../datadirectory adjacent to this file, where JSON fixture files are stored.Caching with
functools.cache: Both fixture reading functions are decorated with@cacheto memoize results. This avoids repeated expensive file reads and decompressions during multiple benchmark runs, significantly improving test speed.File Format Handling: Fixtures ending with
.xzare decompressed transparently usinglzma.decompress. Other files are read as raw bytes.JSON Parsing:
read_fixture_objloads JSON objects from raw bytes usingorjson.loads, leveraging orjson's performance advantage over standard JSON libraries.
Functions
read_fixture(filename: str) -> bytes
Reads the fixture file with the given filename from the data directory, decompressing it if it is an `.xz` compressed file, and returns the raw bytes.
**Parameters:**
filename(str): The name of the fixture file to read. Can be a plain file or compressed.xz.
**Returns:**
bytes: Raw bytes content of the fixture file, decompressed if necessary.
**Usage Example:**
raw_data = read_fixture("example.json.xz")
# raw_data is bytes representing the JSON content, decompressed if .xz
read_fixture_obj(filename: str) -> Any
Reads the fixture file, decompresses if necessary, and parses it into a Python object using `orjson.loads`.
**Parameters:**
filename(str): The fixture file name to read and parse.
**Returns:**
Any: The Python representation of the JSON content (e.g., dict, list).
**Usage Example:**
obj = read_fixture_obj("example.json.xz")
# obj is a deserialized Python object (dict, list, etc.) from the JSON fixture
How This File Interacts with Other Parts of the System
Benchmark Scripts: Various benchmark test scripts use
read_fixtureandread_fixture_objto load large JSON fixture files efficiently. The caching ensures these fixtures are loaded only once per process run, which is crucial for fast repeated benchmarking.Data Directory (
../data): This module depends on the presence of thedatadirectory containing JSON fixture files, some of which are compressed in.xzformat.orjsonLibrary: Utilizesorjsonfor high-performance JSON deserialization.CPU Affinity: Helps normalize benchmark results by reducing CPU scheduling variability—important for fair performance comparisons.
Other Benchmark Utilities: Complements other utility functions that aggregate and analyze benchmark results by focusing solely on fixture I/O and caching.
Summary
`util.py` is a lightweight, yet important utility module that streamlines fixture file handling for the benchmarking framework. Its key strengths lie in transparent compression support, caching for performance, and system-level CPU affinity tuning for consistent benchmarking outcomes.
Mermaid Diagram: Utility Functions Flowchart
flowchart TD
A[Start: Request to Load Fixture] --> B{Is file .xz compressed?}
B -- Yes --> C[Read file bytes from ../data]
C --> D[Decompress with lzma]
D --> E[Cache decompressed bytes]
B -- No --> F[Read file bytes from ../data]
F --> E
E --> G[Return raw bytes]
G --> H{Request to load JSON object?}
H -- Yes --> I[Parse bytes with orjson.loads]
I --> J[Cache parsed object]
J --> K[Return Python object]
H -- No --> K
Summary Table of Entities
Entity | Type | Description |
|---|---|---|
`dirname` | `str` | Path to the JSON fixture directory (`../data`) |
`read_fixture` | function | Reads and caches raw bytes of fixtures, decompressing `.xz` files |
`read_fixture_obj` | function | Reads and caches deserialized Python objects from fixtures |
CPU Affinity Setting | system | Sets process CPU affinity to cores 0 and 1 for consistent timing |
Additional Notes
The use of
functools.cachemeans that once a fixture is loaded during a process run, subsequent calls with the same filename return the cached result without re-reading the disk or decompressing.CPU affinity setting is a best-effort optimization and only applied if the platform supports it (
os.sched_setaffinity).This module does not expose any classes; it focuses solely on utility functions and environmental setup.
Code Snippet Recap
dirname = os.path.join(os.path.dirname(__file__), "../data")
if hasattr(os, "sched_setaffinity"):
os.sched_setaffinity(os.getpid(), {0, 1})
@cache
def read_fixture(filename: str) -> bytes:
path = Path(dirname, filename)
if path.suffix == ".xz":
contents = lzma.decompress(path.read_bytes())
else:
contents = path.read_bytes()
return contents
@cache
def read_fixture_obj(filename: str) -> Any:
return orjson.loads(read_fixture(filename))
This completes the documentation for `util.py`.