dict.rs
Overview
The `dict.rs` file is a core serialization module focused on efficiently converting Python dictionary objects (`dict`) and related structures into a serialized format (typically JSON or similar) using Rust. It is part of a larger serialization framework that integrates tightly with Python's C API (via `pyo3_ffi`) and Rust's `serde` serialization traits.
The file provides multiple serializer implementations tailored for different dictionary scenarios:
Empty dictionaries,
Dictionaries with string keys,
Dictionaries that require sorted keys,
Dictionaries with non-string keys (converted to strings during serialization).
It handles a wide variety of Python object types as dictionary values, applying specialized serializers for each type to optimize performance and correctness.
Key Entities and Their Responsibilities
1. ZeroDictSerializer
Purpose: Serializes an empty Python dictionary as a fixed byte slice
{}.Implementation:
Implements
serde::Serialize.Returns the serialized byte representation of an empty dictionary.
Usage: Used when the dictionary size is zero to avoid overhead.
pub(crate) struct ZeroDictSerializer;
impl ZeroDictSerializer {
pub const fn new() -> Self {
Self {}
}
}
impl Serialize for ZeroDictSerializer {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_bytes(b"{}")
}
}
2. DictGenericSerializer
Purpose: A generic dictionary serializer that dynamically dispatches to specialized serializers based on runtime options and dictionary key types.
Fields:
ptr: *mut pyo3_ffi::PyObject: Raw pointer to the Python dictionary object.state: SerializerState: Holds serialization options and state (e.g., recursion depth).default: Option<NonNull<pyo3_ffi::PyObject>>: Optional default value for missing keys.
Behavior:
Checks recursion limits.
Dispatches to one of three serializers based on options:
Dictfor string keys with no sorting,DictNonStrKeywhen non-string keys are allowed,DictSortedKeywhen keys need sorting.
Usage: Entry point for dictionary serialization.
pub(crate) struct DictGenericSerializer {
ptr: *mut pyo3_ffi::PyObject,
state: SerializerState,
default: Option<NonNull<pyo3_ffi::PyObject>>,
}
impl DictGenericSerializer {
pub fn new(
ptr: *mut pyo3_ffi::PyObject,
state: SerializerState,
default: Option<NonNull<pyo3_ffi::PyObject>>,
) -> Self { ... }
}
impl Serialize for DictGenericSerializer {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,
{ ... }
}
3. Dict
Purpose: Serializes Python dictionaries with string keys without sorting.
Fields:
Same as
DictGenericSerializer.
Serialization Algorithm:
Iterates over the dictionary items using
pydict_next!macro.Validates keys are strings; returns error otherwise.
Converts keys to Rust string slices.
Serializes each key and value using specialized serializers based on the value's Python type.
Error Handling:
Errors if keys are not strings or invalid UTF-8.
Usage: Default dictionary serializer for standard Python dicts with string keys.
pub(crate) struct Dict {
ptr: *mut pyo3_ffi::PyObject,
state: SerializerState,
default: Option<NonNull<pyo3_ffi::PyObject>>,
}
impl Serialize for Dict {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,
{ ... }
}
4. DictSortedKey
Purpose: Serializes dictionaries with string keys, but sorts keys alphabetically before serialization.
Fields:
Same as
Dict.
Serialization Algorithm:
Collects all items into a
SmallVecbuffer.Sorts the items by key.
Serializes the sorted items.
Usage: Used when sorting keys is enabled in options.
pub(crate) struct DictSortedKey {
ptr: *mut pyo3_ffi::PyObject,
state: SerializerState,
default: Option<NonNull<pyo3_ffi::PyObject>>,
}
impl Serialize for DictSortedKey {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,
{ ... }
}
5. DictNonStrKey
Purpose: Serializes Python dictionaries with non-string keys by converting keys to strings.
Fields:
Same as other dict serializers.
Key Conversion:
Converts various Python types (int, float, datetime, UUID, bool, enum, etc.) to string representations.
Returns errors for unsupported key types (e.g., lists, tuples, dicts).
Sorting:
Optionally sorts keys if enabled.
Serialization Algorithm:
Converts all keys to strings.
Serializes key-value pairs using
PyObjectSerializerfor values.
Usage: When dictionary keys are not strings or when non-string keys are explicitly allowed.
pub(crate) struct DictNonStrKey {
ptr: *mut pyo3_ffi::PyObject,
state: SerializerState,
default: Option<NonNull<pyo3_ffi::PyObject>>,
}
impl DictNonStrKey {
fn pyobject_to_string(
key: *mut pyo3_ffi::PyObject,
opts: crate::opt::Opt,
) -> Result<String, SerializeError> { ... }
}
impl Serialize for DictNonStrKey {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,
{ ... }
}
Utility Macro: impl_serialize_entry!
Purpose: Matches the Python value type and serializes it using the appropriate specialized serializer.
Input:
$map: The serializer map reference.$self: Reference to the current serializer struct.$key: The dictionary key (string).$value: The Python value object pointer.
Value Type Handling:
Covers strings, integers, floats, booleans, datetime-like objects, UUIDs, lists, tuples, dataclasses, enums, numpy arrays/scalars, fragments, unknown types.
Usage: Used internally during dictionary serialization to handle the diversity of Python types.
Important Implementation Details
Raw Python Object Pointers: The serializers work directly with raw
*mut pyo3_ffi::PyObjectpointers for performance and low-level control.Recursion Limit Checks: Prevents stack overflow by checking recursion depth (
self.state.recursion_limit()).Use of Unsafe Rust: Unsafe code is used to dereference pointers and cast types because of interaction with Python's C API.
Sorting: Sorting of keys is done via Rust's standard
sort_unstable_byon small vectors of key-value tuples.String Conversion: Non-string keys are converted to strings using specialized functions that handle Python datetime, UUID, floats, ints, enums, etc.
Error Handling: Serialization errors propagate via
SerializeError, including invalid keys, unsupported types, or recursion issues.Performance Considerations:
Use of
SmallVecto avoid heap allocations for small dictionaries.Inline attribute hints for hot paths.
Separate serializers for empty dictionaries to optimize common cases.
Interaction with Other System Components
SerializerState: Holds the current state and options for serialization, such as recursion depth and enabled flags.PyObjectSerializer: Generic serializer for Python objects used when value types don't have specialized serializers.Specialized Serializers (e.g.,
IntSerializer,FloatSerializer,DateTime,UUID): Used to serialize corresponding Python types with optimized logic.Macros and FFI:
Uses macros like
pydict_next!to iterate over Python dict items.Uses Python C API functions (
Py_SIZE,PyFloat_AS_DOUBLE, etc.) for inspecting Python objects.
serdeFramework: Implementsserde::Serializetrait to integrate with Rust serialization ecosystem.
Usage Example
Assuming you have a Python dictionary object pointer `py_dict_ptr` and a `SerializerState` `state`, you can serialize the dictionary as follows:
let dict_serializer = DictGenericSerializer::new(py_dict_ptr, state, None);
let serialized = serde_json::to_string(&dict_serializer)?;
println!("{}", serialized);
This will serialize the Python dictionary to a JSON string, automatically handling different key types, sorting options, and nested objects.
Visual Diagram: Class Structure and Relationships
classDiagram
class DictGenericSerializer {
-ptr: *mut PyObject
-state: SerializerState
-default: Option<NonNull<PyObject>>
+new(ptr, state, default)
+serialize(serializer)
}
class ZeroDictSerializer {
+new()
+serialize(serializer)
}
class Dict {
-ptr: *mut PyObject
-state: SerializerState
-default: Option<NonNull<PyObject>>
+serialize(serializer)
}
class DictSortedKey {
-ptr: *mut PyObject
-state: SerializerState
-default: Option<NonNull<PyObject>>
+serialize(serializer)
}
class DictNonStrKey {
-ptr: *mut PyObject
-state: SerializerState
-default: Option<NonNull<PyObject>>
+pyobject_to_string(key, opts)
+serialize(serializer)
}
DictGenericSerializer --> ZeroDictSerializer : uses when dict empty
DictGenericSerializer --> Dict : delegates serialize (string keys, no sort)
DictGenericSerializer --> DictNonStrKey : delegates serialize (non-str keys)
DictGenericSerializer --> DictSortedKey : delegates serialize (sorted keys)
Summary
The `dict.rs` module is a specialized component in the serialization framework designed to efficiently handle Python dictionaries. It intelligently chooses serialization strategies based on dictionary contents and user options, supports a wide variety of Python key and value types, and integrates deeply with Python's C API and Rust's `serde` traits. The code balances performance optimizations (e.g., zero-copy for empty dicts, use of `SmallVec`) with correctness (type checking, error handling), making it a robust solution for dictionary serialization in mixed Python-Rust environments.