bytes.rs
Overview
The `bytes.rs` source file provides low-level, unsafe Rust FFI (Foreign Function Interface) bindings to access the internal data and size fields of Python `bytes` objects. Its primary purpose is to enable zero-copy, high-performance retrieval of the raw byte buffer and its length directly from a Python `bytes` object pointer (`PyObject *`), bypassing the Python API overhead.
By exposing these raw pointers and sizes through minimal, inline unsafe functions, this file supports critical buffer manipulation operations in the broader system, particularly for JSON serialization and deserialization workflows that require fast, direct access to Python byte sequences.
Detailed Explanation of Functions
PyBytes_AS_STRING
pub(crate) unsafe fn PyBytes_AS_STRING(op: *mut PyObject) -> *const c_char
Description
Returns a raw pointer to the internal byte buffer (`ob_sval`) of the Python `bytes` object referenced by `op`.
Parameters
op: A raw mutable pointer to a Python object (*mut PyObject). This is expected to point to a valid Pythonbytesobject.
Returns
A raw constant pointer (
*const c_char) to the first byte of the internal buffer of the bytes object.
Usage
This function enables Rust code to obtain the starting address of the byte array contained in the Python `bytes` object, allowing direct read access.
Safety
The caller must ensure that
opis a valid pointer to a Pythonbytesobject.Dereferencing invalid pointers or pointers to non-bytes objects leads to undefined behavior.
The returned pointer is valid for the lifetime of the
bytesobject.
Example
unsafe {
let py_bytes_ptr: *mut PyObject = /* obtained from Python API */;
let data_ptr: *const c_char = PyBytes_AS_STRING(py_bytes_ptr);
// data_ptr can now be used to read the bytes data directly
}
PyBytes_GET_SIZE
pub(crate) unsafe fn PyBytes_GET_SIZE(op: *mut PyObject) -> Py_ssize_t
Description
Retrieves the size (length) of the Python `bytes` object referenced by `op` by accessing its `ob_size` field.
Parameters
op: A raw mutable pointer to a Python object (*mut PyObject). This should point to a valid Pythonbytesobject.
Returns
The size of the
bytesobject buffer asPy_ssize_t(typically a signed integer type representing sizes in Python C API).
Usage
This function allows Rust code to determine the exact length of the byte buffer, necessary for safe slicing or iteration over the bytes.
Safety
The caller must ensure that
oppoints to a valid Pythonbytesobject.Accessing
ob_sizeon an invalid pointer or non-bytes object causes undefined behavior.
Example
unsafe {
let py_bytes_ptr: *mut PyObject = /* obtained from Python API */;
let size: Py_ssize_t = PyBytes_GET_SIZE(py_bytes_ptr);
// size can now be used to safely read the buffer of length `size`
}
Important Implementation Details
Both functions use Rust's
unsafekeyword and raw pointer casts to access internal Python C structures:PyBytesObjectfor accessing theob_svalfield, which holds the byte buffer.PyVarObjectfor accessing theob_sizefield, which holds the variable size of the object.
The
#[inline(always)]attribute hints the compiler to always inline these functions, minimizing overhead.The functions are marked with
#[allow(non_snake_case)]to keep the naming consistent with Python C API conventions.The use of
unsafeis necessary because the code directly manipulates raw pointers and depends on Python internal layout, which Rust's safety guarantees cannot verify.
Interaction with Other Parts of the System
This file resides in the FFI boundary layer of the Rust codebase, bridging Python's C API to Rust.
It is primarily used by the serialization and deserialization modules that process Python bytes objects to convert JSON data efficiently.
The raw pointers and sizes provided by these functions allow zero-copy operations on Python
bytes, which significantly improves performance by avoiding extra data duplication.These functions complement similar bindings for Python
memoryviewobjects (found in other files likebuffer.rs), which handle more complex buffer protocols.Higher-level Rust abstractions or safe wrappers may build on these unsafe primitives to provide safer interfaces for the rest of the application.
Visual Diagram
The following Mermaid class diagram illustrates the relationship between Python internal structures and the Rust accessor functions in this file:
classDiagram
class PyObject {
<<opaque>>
}
class PyBytesObject {
+ob_sval: [u8] "Raw byte buffer"
}
class PyVarObject {
+ob_size: Py_ssize_t "Buffer size"
}
class RustAccessor {
+PyBytes_AS_STRING(op: *mut PyObject) -> *const c_char
+PyBytes_GET_SIZE(op: *mut PyObject) -> Py_ssize_t
}
PyBytesObject --|> PyObject : extends
PyVarObject --|> PyObject : extends
RustAccessor ..> PyBytesObject : accesses ob_sval
RustAccessor ..> PyVarObject : accesses ob_size
**Diagram Explanation:**
PyBytesObjectandPyVarObjectare internal Python C structs that extendPyObject.The Rust functions in
RustAccessorcast a genericPyObjectpointer to these specific structs to accessob_sval(byte buffer) andob_size(length).This direct access enables efficient buffer operations in Rust code.
Summary
bytes.rsprovides minimal, efficient, unsafe Rust functions to access Pythonbytesdata pointers and sizes.These functions are essential for zero-copy buffer access in Rust-based JSON serialization and deserialization.
The file acts as a crucial bridge between Python's internal C API structures and Rust's memory-safe abstractions, leveraging
unsafeRust for performance.It is part of the FFI layer, interacting closely with other modules handling Python buffer protocols and serialization logic.
If you are integrating `bytes.rs` into your Rust code interfacing with Python bytes objects, always ensure pointers passed are valid, and use these functions within `unsafe` blocks to respect Rust’s safety model.