sse2.rs

Overview

`sse2.rs` is a Rust source file that implements an optimized function for formatting and escaping strings using SIMD (Single Instruction, Multiple Data) instructions available on x86_64 processors with SSE2 support. The core functionality revolves around efficiently processing string data in 16-byte chunks, detecting characters that require escaping (such as backslashes, double quotes, and control characters), and writing an escaped version of the string to a destination buffer.

This file is primarily designed for performance-critical string escaping operations, likely as part of a larger serialization or text processing system—such as JSON encoding or similar tasks where escaping special characters in strings is necessary.

The implementation harnesses low-level CPU instructions for parallel byte comparisons and masking, significantly improving throughput over scalar (byte-by-byte) approaches.


Detailed Explanation

Function: format_escaped_str_impl_sse2_128

pub(crate) unsafe fn format_escaped_str_impl_sse2_128(
    odst: *mut u8,
    value_ptr: *const u8,
    value_len: usize,
) -> usize

Purpose

Formats a byte slice (string) pointed to by `value_ptr` of length `value_len` by escaping certain characters and writing the result into the buffer at `odst`. The output string is wrapped with double quotes (`"`). The function returns the total number of bytes written.

Parameters

Returns

Safety

Usage Example

let input = b"Hello \"world\" \\ test";
let mut output = [0u8; 64];
let bytes_written = unsafe {
    format_escaped_str_impl_sse2_128(output.as_mut_ptr(), input.as_ptr(), input.len())
};
let escaped_str = std::str::from_utf8(&output[..bytes_written]).unwrap();
assert_eq!(escaped_str, "\"Hello \\\"world\\\" \\\\ test\"");

Implementation Details

SIMD Processing Using SSE2

Processing Logic

Macros and External Dependencies

These macros and functions are expected to be part of the larger codebase and provide specialized implementations for escaping and bit manipulation.


Interaction with Other Parts of the System


Summary


Mermaid Diagram: Flowchart of format_escaped_str_impl_sse2_128

flowchart TD
    Start([Start]) --> WriteQuoteStart["Write initial '\"' to dst"]
    WriteQuoteStart --> CheckLen{"value_len < 16?"}
    CheckLen -- Yes --> ScalarEscaping["impl_format_scalar!(dst, src, value_len)"]
    ScalarEscaping --> WriteQuoteEnd["Write final '\"' to dst"]
    WriteQuoteEnd --> Return["Return bytes written"]

    CheckLen -- No --> InitSIMD["Initialize SIMD registers:\n- blash '\\'\n- quote '\"'\n- x20 for control chars\n- zero"]
    InitSIMD --> LoopStart["Set pointers and nb = value_len\nStart main loop"]
    LoopStart --> LoopCondition{"nb >= 16?"}
    LoopCondition -- No --> HandleLastChunk["Handle last partial chunk using scratch buffer"]
    HandleLastChunk --> WriteQuoteEnd

    LoopCondition -- Yes --> LoadChunk["Load 16 bytes from src"]
    LoadChunk --> CompareMask["Compare for '\\', '\"', control chars\nGenerate mask"]
    CompareMask --> MaskCheck{"mask != 0?"}
    MaskCheck -- No --> AdvancePointers["Advance src/dst by 16\nnb -= 16"]
    AdvancePointers --> LoopStart

    MaskCheck -- Yes --> EscapeByte["Find first byte to escape\nWrite previous bytes\nWrite escaped byte"]
    EscapeByte --> UpdatePointers["Update pointers and nb"]
    UpdatePointers --> LoopStart

    Return --> End([End])

End of Documentation for sse2.rs