generic.rs
Overview
The `generic.rs` file provides a highly optimized, SIMD-accelerated function for formatting and escaping string data in memory. Specifically, it implements a generic 128-bit SIMD-based routine to escape special characters (`\`, `"`, and control characters below ASCII 32) in a byte string, wrapping the result in double quotes (`"`). The function writes the escaped string directly to a provided output buffer pointer and returns the total number of bytes written.
This file focuses on performance by leveraging architecture-specific SIMD instructions (notably ARM NEON when available on AArch64 targets) to process 16-byte chunks in parallel. It falls back to scalar processing for small inputs or trailing bytes that don't fill a SIMD vector. The implementation is `unsafe` due to raw pointer manipulation and direct memory writes, requiring careful use to avoid undefined behavior.
Detailed API Documentation
Function: format_escaped_str_impl_generic_128
pub(crate) unsafe fn format_escaped_str_impl_generic_128(
odst: *mut u8,
value_ptr: *const u8,
value_len: usize,
) -> usize
Description
Formats a raw byte slice as a double-quoted, escaped string and writes the result into a destination buffer. It escapes backslashes (`\`), double quotes (`"`), and all ASCII control characters (code points less than 32). The function uses 128-bit SIMD vectors (`u8x16`) to accelerate processing of large inputs.
Parameters
odst: *mut u8
Pointer to the start of the destination buffer where the escaped string will be written.value_ptr: *const u8
Pointer to the input byte string to escape.value_len: usize
The length of the input byte string in bytes.
Returns
usize
The number of bytes written to the destination buffer, including the surrounding double quotes.
Usage Example
let input = b"Hello\nWorld\"Test\\String";
let mut output = [0u8; 64];
let written = unsafe {
format_escaped_str_impl_generic_128(output.as_mut_ptr(), input.as_ptr(), input.len())
};
let escaped_str = std::str::from_utf8(&output[..written]).unwrap();
println!("{}", escaped_str);
// Output: "Hello\nWorld\"Test\\String"
*Note*: The above example assumes the existence of the scalar fallback macro `impl_format_scalar!` and the macro `write_escape!` which are part of the larger codebase for escaping characters.
Implementation Details and Algorithm
SIMD-based Escaping Strategy
The function always begins by writing a double quote (
") at the start of the output buffer.For input strings shorter than 16 bytes (the SIMD stride), it delegates to a scalar implementation (via
impl_format_scalar!macro).For inputs of length 16 bytes or more:
It processes the input in 16-byte chunks using the
core::simd::u8x16vector type.Each 16-byte vector is compared simultaneously against three criteria:
Bytes equal to backslash (
\, ASCII 92)Bytes equal to double quote (
", ASCII 34)Bytes less than ASCII 32 (control characters)
The combined mask determines which bytes need escaping.
If no bytes require escaping in the current chunk, it is copied directly to the output.
If escapes are needed, the function finds the first byte that requires escaping:
Copies up to that byte.
Writes the escape sequence for that byte using the
write_escape!macro.Advances pointers accordingly and continues processing.
After processing all full strides, the last (potentially partial) stride is handled via a scratch buffer to avoid overreads.
Finally, a closing double quote is written and the total length is computed.
Important Macros (from context)
impl_format_scalar!: Handles escaping for inputs smaller than one SIMD stride or scalar fallback.write_escape!: Writes the escaped form of a problematic character (e.g., converts\nto\\n).trailing_zeros!: Finds the index of the least significant set bit, used to locate the first character to escape from the SIMD mask.
Safety and Performance Considerations
The function is marked
unsafebecause it performs raw pointer arithmetic and writes.The
#[inline(never)]attribute suggests the function is large or complex enough to benefit from less aggressive inlining.The
#[cfg_attr(target_arch = "aarch64", target_feature(enable = "neon"))]attribute enables ARM NEON SIMD instructions on AArch64 targets to boost performance.The use of
u8x16SIMD vectors enables 16 bytes to be checked in parallel, significantly speeding up escaping for large strings.Scratch buffer usage prevents out-of-bounds memory access when handling the final partial stride.
Interaction with Other System Components
The function is
pub(crate), indicating it is internal to the crate/module and used by higher-level string formatting or serialization routines.It likely integrates with JSON serialization or other text encoding components where safe escaping of string data is required.
It depends on macros such as
impl_format_scalar!,write_escape!, andtrailing_zeros!, which are defined elsewhere in the crate.The use of SIMD and target-specific features means this function is a performance-critical piece, probably called repeatedly in data serialization pipelines.
This file does not handle memory allocation itself; it assumes the caller has allocated sufficient output buffer space.
Visual Diagram: Function Workflow Flowchart
flowchart TD
Start[Start: Write leading quote (")] --> CheckLen{Is input length < 16?}
CheckLen -- Yes --> Scalar[Call scalar escaping implementation]
Scalar --> WriteEnd[Write trailing quote (")]
CheckLen -- No --> SIMDLoop[Process 16-byte chunks with SIMD]
SIMDLoop -->|No escapes detected| CopyChunk[Copy chunk directly to output]
SIMDLoop -->|Escapes detected| EscapeByte[Find first byte to escape]
EscapeByte --> WriteEscape[Write escape sequence]
WriteEscape --> SIMDLoop
SIMDLoop --> LastStride[Process last partial stride with scratch buffer]
LastStride --> WriteEnd
WriteEnd --> End[Return total bytes written]
Summary
Purpose: Efficiently format and escape strings with special characters using SIMD acceleration.
Core Function:
format_escaped_str_impl_generic_128escapes backslashes, quotes, and control characters in strings.Performance: Uses 128-bit SIMD vectors (
u8x16) for parallel processing of 16-byte chunks.Safety: Unsafe Rust due to direct raw pointer manipulation; must be used carefully.
Integration: Intended for internal use within a crate/module, likely part of a serialization or text encoding pipeline.
Extensibility: Supports architecture-specific SIMD optimizations (e.g., AArch64 NEON).
Fallback: Uses scalar processing for small input or trailing bytes.
This file exemplifies a balance between performance (via SIMD) and correctness (correct escaping) for string formatting in a low-level systems or serialization context.