unicode.rs
Overview
The `unicode.rs` file provides serialization support for Python string objects within a Rust environment integrating with Python's C API (via `pyo3_ffi`). Specifically, it defines two transparent wrapper structs, `StrSerializer` and `StrSubclassSerializer`, which wrap raw Python string pointers (`PyObject*`). These wrappers implement the `serde::Serialize` trait, enabling seamless serialization of Python strings and their subclasses into Rust serializers (e.g., JSON serializers).
This file is crucial in bridging Python Unicode string types and Rust's serialization ecosystem, ensuring Python string objects—both base `str` and user-defined subclasses—can be serialized reliably with proper Unicode handling.
Detailed Documentation
Structs
StrSerializer
A transparent wrapper around a raw Python string pointer (`*mut pyo3_ffi::PyObject`) representing a Python `str` object.
Definition:
#[repr(transparent)] pub(crate) struct StrSerializer { ptr: *mut pyo3_ffi::PyObject, }Purpose:
Encapsulates a Pythonstrpointer and enables its serialization via Rust'sserde::Serializetrait.Constructor:
pub fn new(ptr: *mut pyo3_ffi::PyObject) -> SelfParameters:
ptr: Raw pointer to a Pythonstrobject.
Returns: A new
StrSerializerwrapping the given pointer.Usage Example:
let py_str_ptr: *mut pyo3_ffi::PyObject = /* obtained from Python API */; let serializer = StrSerializer::new(py_str_ptr);
Trait Implementation:
Implements theserde::Serializetrait:impl Serialize for StrSerializer { fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: Serializer }Functionality:
Converts the raw pointer into a safe Rust Unicode string slice by:
Unsafely creating a
PyStrwrapper viaPyStr::from_ptr_unchecked(self.ptr).Calling
.to_str()to get anOption<&str>.
If successful, calls
serializer.serialize_str(uni)to serialize the string.Returns a serialization error of type
SerializeError::InvalidStrif conversion fails.
Return:
Result<S::Ok, S::Error>- Success or error of serialization.
Important:
Marked
#[inline(always)]for performance optimization.Unsafe code is encapsulated and justified by the assumption that the pointer is valid and points to a Python Unicode string.
StrSubclassSerializer
A transparent wrapper similar to `StrSerializer`, but for Python string subclasses (`str` subclass instances).
Definition:
#[repr(transparent)] pub(crate) struct StrSubclassSerializer { ptr: *mut pyo3_ffi::PyObject, }Purpose:
Facilitates serialization of Python objects that inherit fromstrbut might add extra behavior or data.Constructor:
pub fn new(ptr: *mut pyo3_ffi::PyObject) -> SelfParameters:
ptr: Raw pointer to a Pythonstrsubclass object.
Returns: A new
StrSubclassSerializerwrapping the given pointer.Usage Example:
let py_str_subclass_ptr: *mut pyo3_ffi::PyObject = /* obtained from Python API */; let serializer = StrSubclassSerializer::new(py_str_subclass_ptr);
Trait Implementation:
Implements theserde::Serializetrait:impl Serialize for StrSubclassSerializer { fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> where S: Serializer }Functionality:
Converts the raw pointer into a Rust Unicode string slice similarly to
StrSerializer, but usesPyStrSubclass::from_ptr_unchecked.Attempts to extract a string slice and serialize it.
Returns a serialization error on failure.
Return:
Result<S::Ok, S::Error>- Outcome of serialization.
Notes:
Marked
#[inline(never)], possibly due to differences in expected usage patterns or to aid debugging.Utilizes unsafe code for pointer dereferencing, assuming correctness of input pointer.
Important Implementation Details
Safety and Unsafe Code:
Both serializers use unsafe blocks to convert raw Python object pointers into safe Rust references (PyStrandPyStrSubclass). This requires that the pointers are valid and the objects they point to are indeed Python string objects or subclasses thereof.Error Handling:
If the conversion to a Rust&strslice fails (e.g., due to invalid UTF-8 data or corrupted Python objects), serialization fails gracefully with a customSerializeError::InvalidStr.Performance:
TheStrSerializer'sserializemethod is marked#[inline(always)]to reduce call overhead in hot serialization paths, whileStrSubclassSerializeropts out of inlining.Dependency on External Modules:
Uses
PyStrandPyStrSubclassstructs from thecrate::strmodule which presumably provide safe wrappers around Python Unicode objects.Uses the
SerializeErrorfromcrate::serialize::errorfor error signaling.Relies on Serde's
SerializeandSerializertraits for serialization abstraction.
Interaction With Other Parts of the System
Python Interoperability:
This file depends on the Python C API pointers (pyo3_ffi::PyObject) and assumes the existence of safe wrappers for Python strings. This is part of a larger Rust-Python interoperability layer, likely thepyo3ecosystem or a similar binding.Serialization Pipeline:
The serializers here provide a bridge between Python Unicode objects and Rust’s generic serialization framework (serde). When Python string objects need to be serialized (for example, to JSON or other formats), these serializers are used to convert the Python strings into Rust strings, then serialize them accordingly.Error Propagation:
Serialization errors related to invalid Python strings propagate upward asSerializeError::InvalidStr, allowing calling code to handle serialization failures gracefully.Modular Structure:
The file ispub(crate), meaning it is intended for internal crate use, likely within a module focused on Python string serialization.
Usage Example
use serde_json;
use unicode::StrSerializer;
fn serialize_python_str(py_str_ptr: *mut pyo3_ffi::PyObject) -> Result<String, serde_json::Error> {
let serializer = StrSerializer::new(py_str_ptr);
serde_json::to_string(&serializer)
}
This example shows how to wrap a raw Python string pointer with `StrSerializer` and then serialize it to a JSON string.
Mermaid Diagram
flowchart TD
A[StrSerializer] -->|wraps| B(PyObject pointer)
A -->|implements| C{Serialize Trait}
C --> D[serialize method]
D --> E[Unsafe PyStr conversion]
E -->|Some(&str)| F[serializer.serialize_str]
E -->|None| G[Return SerializeError::InvalidStr]
H[StrSubclassSerializer] -->|wraps| B
H -->|implements| I{Serialize Trait}
I --> J[serialize method]
J --> K[Unsafe PyStrSubclass conversion]
K -->|Some(&str)| F
K -->|None| G
Summary
The `unicode.rs` file is a specialized utility module within a Rust-Python binding project that enables serialization of Python Unicode string objects and their subclasses using Rust’s Serde framework. It provides safe abstractions over raw Python object pointers and handles potential Unicode conversion errors robustly. This file plays a vital role in ensuring Python strings can be efficiently and reliably serialized into Rust-supported formats in a type-safe manner.
*End of `unicode.rs` documentation.*