pystr.rs
Overview
The [pystr.rs](/projects/287/67734) file provides a Rust abstraction layer for working with Python string objects (`str`) at the FFI (Foreign Function Interface) boundary using the `pyo3` crate's low-level Python C API bindings (`pyo3_ffi`). It defines safe and efficient wrappers around raw Python string pointers (`PyObject`), enabling Rust code to create, manipulate, and convert Python strings while handling Python's internal string representations and optimizations.
Key functionalities include:
Creating Python string objects from Rust
&str.Accessing the underlying Rust string slice (
&str) from Python string objects.Computing and caching the hash of Python string objects.
Supporting Python string subclasses with appropriate validations.
Optimizing string creation using CPU feature detection (e.g., AVX512).
Handling endian-specific and Python-version-specific internal string state details.
This file is integral to bridging Python string handling in Rust code that interoperates with Python via the PyO3 project, especially for performance-sensitive or low-level string operations.
Detailed Explanation of Components
Imports and Conditional Compilation
Imports various Python C API structs and functions (
PyObject,PyASCIIObject,PyCompactUnicodeObject) frompyo3_ffi.Uses conditional compilation to adapt to platform endianness (
target_endian = "little"), Python version features (Py_3_14), and CPU feature detection (avx512).Uses internal macros and utility functions (e.g.,
ffi!,nonnull!,is_class_by_type!) presumably defined elsewhere in the crate.
Free Function: to_str_via_ffi
fn to_str_via_ffi(op: *mut PyObject) -> Option<&'static str>
Purpose: Converts a raw Python string object pointer to a Rust string slice (
&'static str) using the Python C API functionPyUnicode_AsUTF8AndSize.Parameters:
op: Raw pointer to a Python object assumed to be a Python string.
Returns:
Option<&'static str>- A Rust string slice if conversion succeeds;Noneif the pointer is null or conversion fails.Usage: Used internally as a fallback to extract string data when direct memory access optimization is not possible.
Type Alias and Static Variable (Conditional)
#[cfg(feature = "avx512")]
pub type StrDeserializer = unsafe fn(&str) -> *mut pyo3_ffi::PyObject;
#[cfg(feature = "avx512")]
static mut STR_CREATE_FN: StrDeserializer = super::scalar::str_impl_kind_scalar;
Defines a function pointer type for deserializing Rust
&strinto Python string objects.A mutable static variable
STR_CREATE_FNholds the current implementation function for string creation optimized for AVX512-capable CPUs.The
set_str_create_fn()function dynamically sets this to an AVX512-accelerated function if the CPU supports the feature.
Function: set_str_create_fn
pub fn set_str_create_fn()
Checks at runtime if the CPU supports the
avx512vlfeature and updatesSTR_CREATE_FNto use an AVX512-optimized implementation, if available.Improves string creation performance on supported CPUs.
Constants for State Bit Masks and Shifts
These constants help interpret and manipulate the internal Python string object's state flags:
STATE_KIND_SHIFT,STATE_KIND_MASK: Bitmask and shift amount to extract the kind of Unicode representation.STATE_COMPACT_ASCII,STATE_COMPACT: Flags indicating compact ASCII and compact Unicode string states.
These are conditioned on platform endianness and Python version features, reflecting Python internal implementation details.
Struct: PyStr
#[repr(transparent)]
#[derive(Copy, Clone)]
pub(crate) struct PyStr {
ptr: NonNull<PyObject>,
}
Purpose: A safe Rust wrapper around a non-null pointer to a Python string object (
PyObject).Traits: Implements
Copy,Clone, and unsafeSend/Synctraits to allow thread-safe usage (assuming Python GIL safety is managed externally).
Methods:
from_ptr_unchecked
pub unsafe fn from_ptr_unchecked(ptr: *mut PyObject) -> PyStr
Constructs a
PyStrfrom a raw pointer without checking ownership or validity beyond assertions.Safety: Caller must ensure the pointer is valid and points to a Python string object.
Debug Assertions: Checks non-null pointer and correct Python type.
from_str_with_hash
pub fn from_str_with_hash(buf: &str) -> PyStr
Creates a Python string object from a Rust
&strand precomputes its hash value for performance.Calls
hash()internally.Parameters:
buf: Rust string slice.
Returns: A new
PyStrinstance wrapping the created Python string.
from_str
pub fn from_str(buf: &str) -> PyStr
Creates a Python string object from a Rust
&str.For empty strings, returns a static immortal empty Python string object.
Uses either scalar or AVX512-optimized string creation depending on feature flags.
Parameters:
buf: Rust string slice.
Returns:
PyStrwrapping the new Python string object.
hash
pub fn hash(&mut self)
Calculates and caches the hash of the Python string object.
Uses Python C API functions
Py_HashBufferor_Py_HashBytesdepending on Python version.Accesses Python internal string data buffer directly using state flags for optimizations.
Has separate implementations depending on platform endianness.
Side Effects: Mutates the internal hash field of the Python string object.
to_str
pub fn to_str(self) -> Option<&'static str>
Converts the Python string object back into a Rust
&str.Optimizes access by directly reading internal Python string buffer if possible.
Falls back to
to_str_via_ffiif the internal representation is not compact or accessible.Has separate implementations for little endian and other architectures.
as_ptr and as_non_null_ptr
pub fn as_ptr(self) -> *mut PyObject
pub fn as_non_null_ptr(self) -> NonNull<PyObject>
Accessor methods to retrieve the underlying raw Python object pointer or non-null wrapped pointer.
Struct: PyStrSubclass
#[repr(transparent)]
pub(crate) struct PyStrSubclass {
ptr: NonNull<PyObject>,
}
Represents a Python string subclass instance (i.e., a class derived from
str).Provides similar capabilities but ensures the object is a subclass, not a direct string type.
Methods:
from_ptr_unchecked
pub unsafe fn from_ptr_unchecked(ptr: *mut PyObject) -> PyStrSubclass
Unsafe constructor from raw pointer.
Checks that the object pointer is valid, not a direct string but a subclass of
str.
to_str
pub fn to_str(&self) -> Option<&'static str>
Converts the subclass string object to a Rust
&strusing the FFI method (no direct buffer access optimizations).Safer fallback for string subclasses.
Important Implementation Details
Unsafe Code and Performance: The module uses
unsafecode extensively to directly access Python's internal string structures for performance gains, such as directly reading UTF-8 buffers or computing hashes without overhead.Python Internal State Awareness: Manipulates Python string internals like
state,length, andutf8_lengthfields to optimize string operations.Platform and Version Specifics: The code contains conditional compilation for little-endian architectures, Python 3.14 features, and CPU instruction sets, ensuring compatibility and optimization across different environments.
Static Immutable Empty String: Uses an immortal empty string object to avoid unnecessary allocations.
Hash Caching: The Python string hash is computed once and stored internally, mimicking Python's own string hash caching mechanism.
Dynamic Dispatch for String Creation: Uses a function pointer
STR_CREATE_FNthat can be set at runtime to optimized implementations based on CPU features.
Interaction with Other System Components
crate::typerefandcrate::util: Uses internal crate modules for type information and utility functions/macros (e.g.,STR_TYPE,isize_to_usize).super::scalarandcrate::str::avx512: References scalar and AVX512 optimized implementations of string creation functions.pyo3_ffi: Relies heavily on thepyo3_fficrate, which provides raw bindings to Python's C API to interact with Python objects.Integration Point: This file acts as a foundational layer for Rust code that needs to create, convert, or manipulate Python strings with minimal overhead, likely used by higher-level abstractions in the PyO3 project.
Usage Examples
// Creating a Python string from a Rust &str
let py_str = PyStr::from_str("Hello, Python!");
// Accessing the underlying Python object pointer
let py_obj_ptr = py_str.as_ptr();
// Getting a Rust &str back from PyStr (if possible)
if let Some(rust_str) = py_str.to_str() {
println!("Python string content: {}", rust_str);
}
// Creating string with precomputed hash for optimization
let py_str_with_hash = PyStr::from_str_with_hash("hashed string");
// Working with Python string subclass (assuming a valid pointer `subclass_ptr`)
unsafe {
let py_str_subclass = PyStrSubclass::from_ptr_unchecked(subclass_ptr);
if let Some(content) = py_str_subclass.to_str() {
println!("Subclass string content: {}", content);
}
}
Mermaid Diagram
classDiagram
class PyStr {
-ptr: NonNull<PyObject>
+unsafe from_ptr_unchecked(ptr: *mut PyObject) PyStr
+from_str(buf: &str) PyStr
+from_str_with_hash(buf: &str) PyStr
+hash()
+to_str() Option<&'static str>
+as_ptr() *mut PyObject
+as_non_null_ptr() NonNull<PyObject>
}
class PyStrSubclass {
-ptr: NonNull<PyObject>
+unsafe from_ptr_unchecked(ptr: *mut PyObject) PyStrSubclass
+to_str() Option<&'static str>
}
PyStrSubclass --> PyObject : wraps pointer
PyStr --> PyObject : wraps pointer
Summary
[pystr.rs](/projects/287/67734) is a low-level Rust module providing efficient and safe wrappers around Python string objects for Rust code interacting with Python via FFI. It handles the creation, conversion, and hashing of Python strings, optimized for various CPU features and Python internals. It also supports Python string subclasses with careful type checks. This file is a crucial component within the PyO3 ecosystem for string interoperability between Rust and Python.