graph
Overview
The `graph` file is a Python utility script designed to **aggregate**, **tabulate**, and **visualize** benchmark results from JSON serialization and deserialization performance tests. It processes JSON-formatted benchmark output files generated by prior runs, organizes metrics by benchmark group and JSON library, and produces:
Markdown-formatted tables summarizing median latencies, throughput (operations per second), and relative performance.
Bar plots comparing libraries' relative throughput for serialization and deserialization tasks, saved as image files.
This file primarily supports the **benchmark reporting and visualization** phase within the benchmarking suite for JSON libraries like `orjson` and the standard Python `json` module. It transforms raw benchmark data into readable reports and comparative graphs, aiding performance analysis and decision-making.
Detailed Description of Functions
aggregate()
def aggregate():
Purpose
Reads benchmark JSON result files from the `.benchmarks` directory, extracts relevant metrics, and aggregates data by benchmark group and library.
Behavior
Locates the first subdirectory inside
.benchmarks/which contains the benchmark JSON files.For each benchmark result file:
Loads JSON data using
orjson.Iterates over individual benchmark entries.
Extracts and converts latency data from seconds to milliseconds.
Collects median latency, operations per second (ops), and correctness flags.
Organizes results into a nested dictionary:
res[group][library] = { data, median, ops, correct }
Returns
res: Adefaultdict(dict)mapping benchmark group names to dictionaries keyed by JSON library names, each containing benchmark statistics.
Usage Example
results = aggregate()
# results might look like:
# {
# "github deserialization": {
# "orjson": {"median": 1.2, "ops": 8000, "correct": True, ...},
# "json": {"median": 10.5, "ops": 900, "correct": True, ...},
# },
# ...
# }
tab(obj)
def tab(obj):
Purpose
Generates formatted tables and comparative bar plots from aggregated benchmark data.
Parameters
obj: A nested dictionary structured as returned byaggregate(). For each benchmark group and library, it contains latency, throughput, and correctness data.
Behavior
Initializes a string buffer to accumulate Markdown-formatted output.
Defines headers for tabular output.
Configures seaborn and matplotlib plotting styles.
For each benchmark group (sorted in reverse order):
Writes a Markdown section header with the group name.
Builds a table of results for the libraries
orjsonandjson:Outputs median latency (ms), operations per second, correctness flag.
Calculates relative latency compared to the
orjsonbaseline.
Adds entries to a list used for plotting.
Formats latency values with appropriate decimal precision.
For each operation type (
serializationanddeserialization):Filters relevant data.
Computes relative throughput normalized to the
jsonstandard library.Uses seaborn to generate bar plots with error bars representing standard deviation.
Customizes plot axes, labels, titles, and legend.
Saves plots as PNG images under
doc/serializationanddoc/deserialization.
Prints the accumulated Markdown tables to standard output.
Returns
None (prints tables and saves plots as side effects).
Usage Example
results = aggregate()
tab(results)
# Outputs tables to stdout and saves plots to disk.
Important Implementation Details and Algorithms
Data Conversion: Latency data from benchmarks comes in seconds. The code converts these to milliseconds (
val * 1000) for human-friendly reporting.Relative Performance Calculation:
Latency relative to orjson's latency is computed as
latency / orjson_baseline.Throughput relative to Python's
jsonlibrary is computed asops / json_baseline.
Plotting:
Bar plots visualize throughput relative to standard
jsonfor each document group and library.Y-axis ticks are adjusted to show integer multiples with a minimum at 1x and a maximum rounded up for clarity.
A dashed horizontal line at y=1 marks the baseline performance of the standard
jsonlibrary.
Data Cleaning:
Corrects a known typo in a group name from
"witter.json"to"twitter.json".
Styling:
Uses seaborn's darkgrid style and transparent figure face color.
Bar plots include error bars representing standard deviation (
errorbar="sd").
Interaction with Other System Components
Input Source:
Reads raw benchmark JSON output files generated by benchmark runs stored in the
.benchmarks/directory.These files are produced by serialization and deserialization benchmark scripts that measure performance of different JSON libraries on various test fixtures.
Libraries Used:
orjsonfor fast JSON loading.pandasandseabornfor data manipulation and plotting.matplotlibfor saving plots.tabulatefor formatting tables in Markdown.
Output:
Markdown tables are printed to standard output, suitable for direct inclusion in documentation or reports.
Bar plot images are saved to the
doc/directory for visualization in reports or websites.
Role in Benchmarking Suite:
Serves as the post-processing visualization and reporting tool for the benchmarking framework.
Complements the benchmark execution scripts by providing human-readable summaries and visual insights into relative performance.
Visual Diagram: Class/Function Structure
Since this file is a utility script with two main functions and no classes, a flowchart illustrating the main function relationships and workflow is most appropriate.
flowchart TD
A[Start] --> B[aggregate()]
B --> C[Parse benchmark JSON files]
C --> D[Build nested dict: group -> library -> stats]
D --> E[Return aggregated data]
E --> F[tab(obj)]
F --> G[Format Markdown tables]
F --> H[Calculate relative performance]
F --> I[Generate bar plots for serialization and deserialization]
G & I --> J[Output tables and save plots]
J --> K[End]
Summary
The `graph` file is a **benchmark result aggregator and reporter** that:
Parses benchmark JSON results.
Organizes latency and throughput metrics by benchmark group and JSON library.
Prints formatted Markdown tables with performance summaries.
Generates comparative bar plots showing throughput relative to the standard Python
jsonlibrary.Saves plots for inclusion in documentation.
This file is crucial for delivering actionable insights from raw benchmark data in the form of human-readable reports and visual comparisons, supporting the overall goal of evaluating and showcasing JSON serialization/deserialization library performance.