numpy.rs
Overview
The [numpy.rs](/projects/287/67683) file provides comprehensive support for serializing NumPy arrays and scalars from Python into Rust data structures that implement the Serde `Serialize` trait. This enables efficient, type-safe serialization of NumPy data for further processing or transmission, typically in JSON or other serialization formats.
This file bridges Python's NumPy data structures and Rust, handling low-level Python FFI interactions, NumPy's C array interface, and the various NumPy data types including multi-dimensional arrays, scalars, and datetime64 types. It enforces constraints such as C-contiguous memory layout and native endianness to guarantee correctness and performance.
Detailed Description of Key Components
Structs and Enums
NumpySerializer<'a>
Purpose:
Wraps aPyObjectSerializerand attempts to serialize the underlying Python object as a NumPy array or falls back to a default serializer on failure.Fields:
previous: &'a PyObjectSerializer — Reference to the previous serializer with Python object context.
Methods:
new(previous: &'a PyObjectSerializer) -> Self
Creates a newNumpySerializerwrapping the previous serializer.
Trait Implementations:
Serialize:
The core serialization logic attempts to create aNumpyArrayfrom the Python object pointer.
On success, serializes theNumpyArray.
On failure, returns detailed errors or falls back to a default serializer if configured.
Usage Example:
let numpy_serializer = NumpySerializer::new(&py_object_serializer); let serialized = serde_json::to_string(&numpy_serializer)?;
NumpyArray
Purpose:
Represents a multi-dimensional NumPy array accessed via the NumPy C array interface. Supports recursively building nested representations for multi-dimensional arrays and serializing them efficiently.Fields:
Field
Type
Description
`array`
`*mut PyArrayInterface`
Raw pointer to the NumPy C array struct.
`position`
`Vec`
Current index position along each dimension.
`children`
`Vec`
Child arrays for nested dimensions.
`depth`
`usize`
Current depth in the multi-dimensional hierarchy.
`capsule`
`*mut PyCapsule`
Python capsule containing array metadata.
`kind`
`ItemType`
Enum representing the element data type.
`opts`
`Opt`
Serialization options.
Constructors:
new(ptr: *mut PyObject, opts: Opt) -> Result<Self, PyArrayError>
Creates a newNumpyArrayfrom a Python object pointer and serialization options.
Validates array flags (C-contiguous, native endian), dimensionality, and data type support.
Key Methods:
dimensions() -> usize
Returns the number of dimensions (ndim).shape() -> &[isize]
Returns the shape of the array as a slice.strides() -> &[isize]
Returns the strides as a slice.data() -> *const c_void
Computes pointer to current data offset based onpositionandstrides.num_items() -> usize
Number of items along the last dimension.build()
Recursively builds childNumpyArrayinstances for each dimension beyond the first.child_from_parent(position: Vec<isize>, num_children: usize) -> Self
Creates a childNumpyArraywith updated position and capacity for children.
Trait Implementations:
Serialize:
Serializes the array recursively.For multi-dimensional arrays: serializes children as sequences.
For single-dimensional arrays: serializes elements according to their
ItemType.Serializes zero-length arrays with specialized
ZeroListSerializer.
Error Enum:
PyArrayErrorwith variants:Malformed,NotContiguous,NotNativeEndian,UnsupportedDataType.
Usage Example:
let numpy_array = NumpyArray::new(py_obj_ptr, opts)?; let json = serde_json::to_string(&numpy_array)?;
ItemType
Purpose:
Enumerates supported NumPy data types for array elements.Variants:
BOOLDATETIME64(NumpyDatetimeUnit)F16,F32,F64(floating point types)I8,I16,I32,I64(signed integers)U8,U16,U32,U64(unsigned integers)
Methods:
find(array: *mut PyArrayInterface, ptr: *mut PyObject) -> Option<ItemType>
Determines theItemTypebased on thetypekindanditemsizefields of the array interface.
NumpyScalar
Purpose:
Wraps a Python object representing a NumPy scalar and serializes it according to its specific scalar type.Fields:
ptr: *mut PyObject— Raw pointer to the Python scalar object.opts: Opt— Serialization options.
Methods:
new(ptr: *mut PyObject, opts: Opt) -> Self
Constructor.
Trait Implementations:
Serialize:
Dispatches serialization based on the scalar's Python type pointer to the appropriate Rust struct (NumpyInt32,NumpyFloat64, etc.).
NumPy Scalar Structs
These structs represent individual NumPy scalar types and implement `Serialize` to convert them into Rust primitives.
Examples include:
NumpyInt8,NumpyInt16,NumpyInt32,NumpyInt64NumpyUint8,NumpyUint16,NumpyUint32,NumpyUint64NumpyFloat16,NumpyFloat32,NumpyFloat64NumpyBool
Each has a `value` field of the corresponding Rust primitive type and implements Serde serialization accordingly.
NumPy Datetime Handling
NumpyDatetimeUnitEnum:
Represents all units supported by NumPy'sdatetime64type, such as Years, Months, Weeks, Days, Hours, Minutes, Seconds, Milliseconds, Microseconds, Nanoseconds, Picoseconds, Femtoseconds, Attoseconds, and a Generic placeholder.NumpyDatetimeUnit::from_pyobject(ptr: *mut PyObject) -> Self:
Extracts the datetime unit from the Python object's dtype descriptor, handling the special case of datetime64 arrays where the usual C interfacedescrfield is not populated.NumpyDatetimeUnit::datetime(val: i64, opts: Opt) -> Result<NumpyDatetime64Repr, NumpyDateTimeError>:
Converts a raw integer datetime value to aNumpyDatetime64Repr(which implementsDateTimeLikeandSerialize), or returns an error if the value is unsupported or unrepresentable.NumpyDatetime64ArrayandNumpyDatetime64Repr:
Wrappers around datetime64 array data and individual datetime64 values respectively, supporting serialization with formatting.
Utility Functions
is_numpy_scalar(ob_type: *mut PyTypeObject) -> bool:
Returns true if the given Python type pointer corresponds to a known NumPy scalar type.is_numpy_array(ob_type: *mut PyTypeObject) -> bool:
Returns true if the given Python type pointer corresponds to a NumPy array type.slice!macro:
Creates a Rust slice from a raw pointer and size.
Important Implementation Details
Direct FFI Access to Python and NumPy Structures:
The file directly manipulates Python objects and NumPy arrays using unsafe pointers, interfacing with Python's C API and NumPy's__array_struct__interface.Memory Safety:
Reference counts on Python objects are managed withPy_DECREFinDropimplementations to prevent memory leaks.Serialization Dispatch:
Serialization is dispatched based on type information extracted from NumPy's C array struct and Python's type pointers, allowing optimized serialization paths for each data type.Handling Multi-Dimensional Arrays:
TheNumpyArraystruct recursively builds nested arrays representing each dimension, enabling natural serialization to nested sequences.Support for NumPy's
datetime64Type:
Special parsing of dtype descriptors to extract datetime units and conversion of integer timestamps to proper datetime representations.Fallback Behavior:
If an array is not C-contiguous or has unsupported data types, the serializer can fall back to a default serializer if configured.
Interaction with Other Parts of the System
PyObjectSerializer:
TheNumpySerializerwraps aroundPyObjectSerializer, enhancing it with NumPy-specific serialization.serialize::buffer::SmallFixedBuffer:
Used in formatting datetime strings for serialization.serialize::error::SerializeError:
Defines specific serialization error types used for reporting NumPy-related serialization problems.opt::Opt:
Serialization options influencing behavior such as datetime formatting.typeref:
Loads and caches NumPy type references for efficient type checking.serde::ser::{Serialize, Serializer}:
Implements serialization traits for all types.jiff::civil::DateTimeandjiff::Timestamp:
Used for datetime calculations and conversions relevant to NumPy datetime64 serialization.
Visual Diagram
classDiagram
class NumpySerializer {
-previous: &PyObjectSerializer
+new(previous: &PyObjectSerializer)
+serialize<S: Serializer>(serializer: S) -> Result<S::Ok, S::Error>
}
class NumpyArray {
-array: *mut PyArrayInterface
-position: Vec<isize>
-children: Vec<NumpyArray>
-depth: usize
-capsule: *mut PyCapsule
-kind: ItemType
-opts: Opt
+new(ptr: *mut PyObject, opts: Opt) -> Result<Self, PyArrayError>
+serialize<S: Serializer>(serializer: S) -> Result<S::Ok, S::Error>
-build()
-child_from_parent(position: Vec<isize>, num_children: usize) -> Self
-data() -> *const c_void
-num_items() -> usize
-shape() -> &[isize]
-strides() -> &[isize]
-dimensions() -> usize
}
class ItemType {
<<enumeration>>
+BOOL
+DATETIME64(NumpyDatetimeUnit)
+F16
+F32
+F64
+I8
+I16
+I32
+I64
+U8
+U16
+U32
+U64
+find(array: *mut PyArrayInterface, ptr: *mut PyObject) -> Option<ItemType>
}
class NumpyScalar {
-ptr: *mut PyObject
-opts: Opt
+new(ptr: *mut PyObject, opts: Opt)
+serialize<S: Serializer>(serializer: S) -> Result<S::Ok, S::Error>
}
NumpySerializer --> PyObjectSerializer : wraps
NumpySerializer --> NumpyArray : creates and serializes
NumpyArray --> NumpyArray : contains children (nested arrays)
NumpyArray --> ItemType : identifies element type
NumpyScalar --> NumpyScalarTypes : dispatches based on type
Summary
The [numpy.rs](/projects/287/67683) file is a core serialization module converting Python NumPy arrays and scalars into Rust serializable types. It handles low-level interfacing with Python and NumPy C APIs, enforces memory layout constraints, supports a wide range of NumPy data types including datetime64, and provides recursive serialization for multi-dimensional arrays. This functionality integrates closely with the overall Python object serialization system, ensuring that NumPy data can be serialized efficiently and accurately for Rust-based processing or external output formats.