run_mem


Overview

The `run_mem` script is a utility designed to measure the **memory consumption** of repeated JSON deserialization operations using different JSON libraries. It loads a compressed JSON fixture from an `.xz` file, repeatedly deserializes it 100 times, and profiles the Resident Set Size (RSS) memory usage before and after these operations. Additionally, it verifies the **correctness** of the deserialization and serialization cycle to ensure data integrity.

This script supports benchmarking memory usage for two JSON libraries:

By quantifying memory growth after repeated loads, the script helps assess the memory efficiency and potential leaks or overheads associated with each library's deserialization process.


Detailed Explanation

Script Flow

  1. Input arguments:

    • filename: Path to a compressed .xz JSON fixture file.

    • lib_name: The JSON library to benchmark ("json" or "orjson").

  2. Load fixture data:

    • The JSON fixture is decompressed and read entirely into memory using the lzma module.

  3. Select JSON library:

    • Dynamically imports dumps and loads functions from the specified library.

  4. Garbage collection:

    • Calls gc.collect() to minimize noise from leftover garbage before memory measurement.

  5. Memory measurement:

    • Uses psutil.Process() to get the current process memory info (RSS).

    • Records RSS memory before starting deserialization.

  6. Repeated deserialization:

    • Runs the loads function on the fixture 100 times.

  7. Measure memory again:

    • Records RSS memory after deserialization loop.

  8. Calculate difference:

    • Computes the delta in memory usage (mem_diff).

  9. Correctness check:

    • Validates that deserializing and reserializing the fixture results in equivalent JSON data:

      • Loads the fixture with the standard json library.

      • Loads and dumps it with the tested library.

      • Compares if the reconstructed JSON matches the original.

  10. Output results:

    • Prints three comma-separated values:

      • Memory before deserialization (in bytes)

      • Memory difference after deserialization loop (in bytes)

      • Correctness flag (1 if correct, 0 if incorrect)


Code Breakdown

#!/usr/bin/env python3
# SPDX-License-Identifier: (Apache-2.0 OR MIT)

import sys
import lzma
import gc
import psutil

filename = sys.argv[1]

# Load compressed fixture file
with lzma.open(filename, "r") as fileh:
    fixture = fileh.read()  # fixture is bytes

proc = psutil.Process()

lib_name = sys.argv[2]
if lib_name == "json":
    from json import dumps, loads
elif lib_name == "orjson":
    from orjson import dumps, loads
else:
    raise NotImplementedError

gc.collect()

# Measure memory before deserialization
mem_before = proc.memory_info().rss

# Deserialize fixture 100 times
for _ in range(100):
    val = loads(fixture)

# Measure memory after deserialization
mem_after = proc.memory_info().rss

mem_diff = mem_after - mem_before

# Correctness check using standard json for canonicalization
from json import loads as json_loads

correct = 1 if (json_loads(fixture) == json_loads(dumps(loads(fixture)))) else 0

# Print results
print(f"{mem_before},{mem_diff},{correct}")

Parameters

Parameter

Description

Type

Example

`filename`

Path to the `.xz` compressed JSON fixture file

`str`

`data/github.json.xz`

`lib_name`

The JSON library to use for deserialization

`str`

`"json"` or `"orjson"`


Output

Printed to standard output as a single line:

<mem_before>,<mem_diff>,<correct>

Usage Example

Assuming the script is executable and named `run_mem`:

./run_mem data/github.json.xz json

Output example:

12345678,102400,1

This means:


Important Implementation Details


Interaction with Other Parts of the System


Visual Diagram: Flowchart of Memory Benchmarking Workflow

flowchart TD
    A[Start: Receive <filename>, <lib_name>] --> B[Load compressed JSON fixture (.xz)]
    B --> C[Import dumps, loads from chosen library]
    C --> D[Run garbage collection]
    D --> E[Measure initial RSS memory (mem_before)]
    E --> F[Loop 100 times: Deserialize JSON fixture]
    F --> G[Measure final RSS memory (mem_after)]
    G --> H[Calculate memory difference (mem_diff = mem_after - mem_before)]
    H --> I[Check correctness by round-trip serialization]
    I --> J{Is JSON data equivalent?}
    J -- Yes --> K[Set correct = 1]
    J -- No --> L[Set correct = 0]
    K --> M[Print mem_before, mem_diff, correct]
    L --> M

Summary

The `run_mem` script is a focused benchmarking tool for measuring memory consumption impacts of repeated JSON deserialization using different libraries. It provides actionable insights into memory efficiency and correctness, aiding in comparative evaluation of JSON libraries under realistic, repeated workloads. The script integrates with a larger benchmarking ecosystem that measures speed, memory, and correctness, contributing to a comprehensive performance evaluation framework.


End of Documentation for run_mem