Custom Serialization Support
Overview
The **Custom Serialization Support** module addresses the challenge of serializing Python objects that are not natively serializable by the JSON encoder. This functionality is essential in scenarios where user-defined or complex types must be converted into JSON-compatible representations. The module provides a flexible mechanism to specify a fallback serialization function (`default`) that can transform unsupported objects into serializable forms, ensuring seamless JSON encoding without losing data integrity or causing errors.
Additionally, this module extends serialization capabilities to specialized data types such as NumPy arrays, enabling their efficient conversion to JSON. This broadens the library’s applicability in scientific and data-intensive Python applications where NumPy is prevalent.
Core Concepts and Purpose
Why Custom Serialization?
Standard JSON serializers handle basic Python types (e.g., `dict`, `list`, `str`, `int`) but fail when encountering custom classes or complex objects. Without customization, serialization attempts raise errors or produce incomplete output. The Custom Serialization Support module solves this issue by:
Allowing users to define a fallback function that receives unsupported objects and returns a JSON-serializable substitute.
Enabling transparent serialization of otherwise unsupported Python objects.
Supporting extended serialization options like NumPy arrays to optimize performance and compatibility.
Problems Addressed
Unsupported object types: Custom Python classes or third-party types without native JSON representation.
Extensibility: Users can tailor serialization for application-specific types.
Performance with specialized types: Efficient handling of NumPy arrays, a common need in data science contexts.
How the Module Works
Fallback Serialization Function
The module accepts a user-defined fallback function, commonly named `default`, which the serializer calls whenever it encounters an object that is not directly serializable. This function should return a JSON-compatible value (e.g., a string, number, list, dictionary) or `None` if no conversion is possible.
When the
defaultfunction returnsNone, the object is serialized as JSONnull.This mechanism prevents serialization errors by gracefully handling unknown types.
Example from `bench/run_default`:
class Custom:
pass
def default(_):
return None
obj = [[Custom()] * 1000] * 10
dumps(obj, default, OPT_SERIALIZE_NUMPY)
Here, `Custom` objects are replaced by `null` in the JSON output, preventing errors during serialization.
Numpy Serialization Option
To efficiently serialize NumPy arrays, the module offers an option flag (`OPT_SERIALIZE_NUMPY`) that activates specialized serialization paths for NumPy data types. When enabled:
NumPy arrays are converted into JSON arrays without requiring manual conversion.
This reduces overhead and increases performance compared to converting NumPy arrays to lists beforehand.
The fallback
defaultfunction can still be used in conjunction for other types.
Workflow Summary
Serialization starts with the top-level Python object.
Type inspection is performed for each element.
If an element is unsupported, the
defaultfallback is invoked.The fallback either returns a JSON-serializable object or
None.Serialization continues, applying optimized paths for types like NumPy arrays if enabled.
The final JSON bytes result is returned.
Interaction with Other System Components
Serialization Core (
src/serialize): Implements the main serialization logic, invoking fallback functions when needed.Python Integration (
src/ffiandpysrc/orjson): Exposes the fallback function interface to Python users, allowing them to pass a Python callable.Benchmarking and Testing (
bench/run_defaultand related scripts): Demonstrate usage patterns and measure performance impacts of fallback serialization and NumPy support.Error Handling (
src/error): Ensures that serialization errors related to unsupported types are handled gracefully or delegated to the fallback.
The fallback mechanism acts as a bridge between Python's rich object ecosystem and the strict JSON format, coordinated between the Rust core serialization logic and the Python API layer.
Important Concepts and Design Patterns
Callback Pattern: The fallback function is a user-supplied callback invoked on unsupported objects, enabling extensibility without modifying core serialization code.
Option Flags: Serialization behaviors such as NumPy support are toggled via option flags, allowing users to customize performance and compatibility.
Graceful Degradation: Returning
Nonefrom the fallback leads to JSONnulloutput, avoiding exceptions and allowing partial serialization.Batch Processing: Large nested structures with repeated custom objects are processed efficiently, illustrated in the benchmark example where lists contain thousands of
Custominstances.
Code Snippet Illustrating Key Interaction
from orjson import dumps, OPT_SERIALIZE_NUMPY
class Custom:
pass
def default(obj):
# Replace unsupported objects with null
return None
obj = [[Custom()] * 1000] * 10
# Serialize with fallback and numpy support enabled
json_bytes = dumps(obj, default, OPT_SERIALIZE_NUMPY)
This snippet exemplifies how the fallback function integrates with the core serialization call and how option flags influence behavior.
Visualization: Serialization Workflow with Fallback and NumPy Support
flowchart TD
A[Start Serialization] --> B{Is object natively serializable?}
B -- Yes --> C[Serialize object]
B -- No --> D{Is fallback function provided?}
D -- No --> E[Raise Serialization Error]
D -- Yes --> F[Invoke fallback function]
F --> G{Fallback returns JSON-compatible value?}
G -- Yes --> C
G -- No --> H[Serialize as JSON null]
C --> I{Is object a NumPy array?}
I -- Yes & OPT_SERIALIZE_NUMPY enabled --> J[Use optimized NumPy serialization]
I -- No or option disabled --> K[Standard serialization process]
J --> L[Continue serialization]
K --> L
H --> L
L --> M[Complete Serialization]
This flowchart clarifies the decision points during serialization when encountering unsupported objects and the role of the fallback function and NumPy serialization option.
Summary
The **Custom Serialization Support** module enhances the JSON serialization process by providing extensibility through a user-defined fallback function and optimized handling of specialized types like NumPy arrays. It ensures robustness and flexibility in serializing diverse Python objects, integrating seamlessly with the core serialization engine and Python API. The design leverages callback patterns and option flags to balance performance and usability in complex serialization scenarios.