run_func


Overview

`run_func` is a lightweight Python benchmarking utility script designed to repeatedly serialize or deserialize JSON data using the `orjson` library. It operates on a compressed JSON fixture file (in `.xz` format) and performs either serialization (`dumps`) or deserialization (`loads`) operations multiple times, as specified by the user. The script is optimized for consistent benchmarking by disabling garbage collection and setting CPU affinity to specific cores.

This tool is typically used as part of a broader benchmarking framework to measure throughput and performance of JSON serialization and deserialization, providing a quick, repeatable mechanism to gather timing data under controlled CPU conditions.


File Purpose and Functionality


Detailed Explanation

Script Entry and Parameters

The script expects command line arguments:

./run_func <filename> <operation> [iterations]

Example:

./run_func data/github.json.xz dumps 5000

Internal Workflow

  1. Disable Garbage Collection

    gc.disable()
    

    To avoid GC overhead affecting timing measurements.

  2. Set CPU Affinity

    os.sched_setaffinity(os.getpid(), {0, 1})
    

    Restricts the process to CPU cores 0 and 1 for consistent performance.

  3. Load and Decompress Fixture

    with lzma.open(filename, "r") as fileh:
        file_bytes = fileh.read()
    

    Reads the entire compressed file content into memory as bytes.

  4. Branch by Operation

    • If operation is "dumps":

      • Deserialize bytes into a Python object once:

        file_obj = loads(file_bytes)
        
      • Run repeated serialization (dump) of the in-memory object:

        for _ in range(n):
            dumps(file_obj)
        
    • If operation is "loads":

      • Run repeated deserialization (load) of the raw bytes:

        for _ in range(n):
            loads(file_bytes)
        

Functions and Methods

This script does not define any classes or functions but performs all operations at the top-level procedural code scope.


Implementation Details and Algorithms


Usage Example

Suppose you have a compressed fixture file `data/github.json.xz` and want to benchmark serialization performance over 2000 iterations:

./run_func data/github.json.xz dumps 2000

To benchmark deserialization 1000 times (default):

./run_func data/github.json.xz loads

Interaction with Other System Components


Mermaid Diagram: Flowchart of run_func Workflow

flowchart TD
    A[Start] --> B[Parse Command Line Arguments]
    B --> C[Disable Garbage Collection]
    C --> D[Set CPU Affinity to cores 0 and 1]
    D --> E[Open and Read .xz Compressed JSON Fixture]
    E --> F{Operation?}
    F -->|dumps| G[Deserialize bytes into Python object]
    G --> H[Repeat n times: Serialize object with orjson.dumps]
    F -->|loads| I[Repeat n times: Deserialize bytes with orjson.loads]
    H --> J[End]
    I --> J

Summary

`run_func` is a minimal, specialized benchmarking script designed to repeatedly serialize or deserialize JSON data from a compressed fixture file using `orjson`. It is optimized for consistent performance measurement through CPU affinity settings and garbage collection control. This script integrates with the larger benchmarking framework by providing a simple, repeatable workload that can be externally timed or profiled to assess orjson’s throughput characteristics.


Appendix: Key Modules Used

Module

Purpose

`sys`

Access command line arguments

`lzma`

Read `.xz` compressed fixture files

`os`

Set CPU affinity for process

`gc`

Disable garbage collection

`orjson`

High-performance JSON serialization library


End of Documentation for run_func