escape.rs
Overview
This file provides foundational utilities for escaping special characters in strings, primarily aimed at facilitating safe and correct serialization of string data (e.g., JSON encoding). It defines constants and macros to efficiently transform bytes that require escaping into their escaped representations. The escaping logic is optimized for performance, leveraging Rust macros and conditional compilation to enable SIMD-like fast writes when the `inline_int` feature is enabled.
The escaping data and approach in this file are adapted from [cloudwego's sonic-rs](https://github.com/cloudwego/sonic), a high-performance JSON serialization/deserialization library in Rust, indicating a focus on both correctness and speed.
Detailed Explanations
Macros
write_escape!
This macro writes the escaped equivalent of a given byte into a destination pointer buffer.
Signature (conceptual):
write_escape!(byte: u8, dst: *mut u8)Parameters:
byte: The byte value (u8) that needs to be escaped. The macro asserts that the byte is less than 96, which corresponds to the valid index range for the escape sequences defined inQUOTE_TAB.dst: A mutable raw pointer (*mut u8) to the output buffer where the escaped bytes will be written.
Behavior:
It retrieves the appropriate escaped sequence for the byte from the
QUOTE_TABconstant array.Writes the escape sequence to the destination address.
Advances the destination pointer by the length of the written escape sequence.
Conditional Compilation Variants:
With the
inline_intfeature enabled:The escape sequence (8 bytes) is loaded as a
u64integer and written atomically to the destination buffer viacore::ptr::write.The pointer
dstis advanced by the length encoded in the highest byte of theu64value.This approach minimizes overhead by using integer operations and unsafe unchecked indexing.
Without
inline_int:The escape sequence is copied byte-by-byte using
core::ptr::copy_nonoverlapping.The pointer
dstis advanced by the length stored in the 8th byte of the escape sequence.
Usage Example:
let byte_to_escape: u8 = b'\n'; // newline character let mut buffer = [0u8; 16]; let mut dst_ptr = buffer.as_mut_ptr(); unsafe { write_escape!(byte_to_escape, dst_ptr); // dst_ptr now points past the written escape sequence }
Constants
NEED_ESCAPED: [u8; 256]
Purpose:
A lookup table indexed by byte values (0-255) indicating whether a byte requires escaping.
Value
1means the byte must be escaped.Value
0means the byte does not require escaping.
Details:
Controls quick detection of bytes that need escaping during serialization.
For example, control characters (0x00 to 0x1F) and certain special characters like double quotes (
", ASCII 34) and backslash (\, ASCII 92) are marked as needing escaping.
Usage:
Used as a fast filter to decide if a byte needs to be replaced by an escape sequence.
Example:
if NEED_ESCAPED[byte as usize] == 1 { // byte requires escaping }
QUOTE_TAB: [[u8; 8]; 96]
Purpose:
An array mapping byte values (0..=95) to their escaped byte sequences.
Each entry is an 8-byte array containing:
The actual escape sequence bytes (e.g.,
\n,\u000A).Padding zeros to fill 8 bytes.
The last byte encodes the length of the escape sequence.
Structure of an entry:
[escape_bytes..., 0, length]For example:
For newline (
\n, ASCII 10):[b'\\', b'n', 0, 0, 0, 0, 0, 2]Indicates the escape sequence
\nhas length 2.For control character
0x00:[b'\\', b'u', b'0', b'0', b'0', b'0', 0, 6]Indicates the Unicode escape
\u0000with length 6.
Important Notes:
The table only covers 96 entries, corresponding to bytes that actually may need escaping.
Bytes outside this range presumably do not require escaping or are handled differently.
Some entries are zero-filled
[0;8]representing no escape needed or undefined.
Usage:
Used as a constant source for the
write_escape!macro to convert a byte into its escaped form.
Implementation Details & Algorithms
Escape Detection:
The
NEED_ESCAPEDarray allows O(1) detection of whether a byte needs escaping during serialization.This enables fast scanning of input strings to identify bytes requiring transformation.
Escape Sequence Storage:
QUOTE_TABstores escape sequences in fixed 8-byte arrays for each byte needing escaping.The last byte encodes the length, enabling variable-length escapes while using fixed-size storage for SIMD-friendly access.
Fast Writing via Macros:
The
write_escape!macro performs direct unsafe writes to output buffers.By using macros, the write operation can be inlined and optimized by the compiler.
The
inline_intfeature uses a singleu64write to output the escape sequence, reducing overhead.Without
inline_int, it copies bytes in a more traditional way.
Safety:
The macros use
debug_assert!to ensure the byte index is valid.Use of unsafe operations (
get_unchecked,core::ptr::write,copy_nonoverlapping) assumes caller ensures correctness.
Interactions with Other Parts of the System
This file is part of the serialization subsystem (
crate::serialize::writer::str::escape), likely used internally by string serialization routines.It provides the core primitives for escaping characters in strings before writing them to output formats like JSON.
It is designed to be highly performant, suggesting usage in hot paths of serialization where many strings are processed.
The macros and constants here are intended for internal use (
pub(crate)), not for direct public API consumption.The file references
crate::serialize::writer::str::escape::QUOTE_TABin its macros, indicating it lives in a module hierarchy related to string serialization.
Diagram: Flowchart of Main Functions and Their Relationships
flowchart TD
A[Input Byte] --> B{Needs Escaping?}
B -- No --> C[Write Byte as-is]
B -- Yes --> D[Use NEED_ESCAPED to confirm]
D --> E[Lookup escape seq in QUOTE_TAB]
E --> F[write_escape! macro writes escape seq]
F --> G[Advance buffer pointer]
**Explanation:**
The flow starts with an input byte to be serialized.
The system checks
NEED_ESCAPEDto decide if the byte requires escaping.If not, the byte is written directly.
If yes, the escape sequence is fetched from
QUOTE_TAB.The
write_escape!macro handles writing the escape sequence to the output buffer and advances the pointer accordingly.
Summary
The `escape.rs` file implements low-level, high-performance utilities for escaping special characters in strings during serialization. Using a combination of lookup tables and efficient unsafe macros, it enables quick detection and transformation of bytes needing escaping. The design balances safety (through assertions) and speed (using raw pointer writes and conditional compilation for SIMD-like optimizations). This module plays a crucial role in the string serialization pipeline within the broader system, ensuring output strings are correctly escaped according to specification (e.g., JSON requirements).