operators.py

Overview

operators.py is a utility module primarily focused on image preprocessing operations commonly used in computer vision workflows, especially those related to deep learning for tasks like detection, recognition, and segmentation. The file provides a collection of classes and functions to decode, resize, normalize, pad, format, and otherwise manipulate images and associated annotations. These operators are designed to be modular and composable, allowing for flexible preprocessing pipelines.

Key functionalities include:

Image decoding from raw bytes or file paths.
Various resizing strategies preserving aspect ratio or scaling to fixed dimensions.
Image normalization and standardization.
Conversion between different image channel formats (HWC ↔ CHW, RGB ↔ BGR).
Padding images to desired sizes or strides.
Utility functions like Non-Maximum Suppression (NMS).
Support for specialized preprocessing for tasks such as end-to-end OCR and key information extraction (KIE).

Classes and Functions

Class: `DecodeImage`

Purpose:
Decodes an image from a byte buffer and converts it to the desired color mode and channel order.

Constructor Parameters:

img_mode (str, default 'RGB'): Desired output color space. Supported: 'RGB', 'GRAY'.
channel_first (bool, default False): Whether to transpose image channels from HWC to CHW.
ignore_orientation (bool, default False): Whether to ignore EXIF orientation metadata when decoding.

Usage:

decoder = DecodeImage(img_mode='RGB', channel_first=True)
data = {'image': image_bytes}
decoded_data = decoder(data)
img = decoded_data['image']  # numpy array, shape depends on channel_first

Details:

Uses cv2.imdecode to decode images from bytes.
Converts BGR (OpenCV default) to RGB if img_mode is 'RGB'.
For grayscale mode, converts gray image to 3-channel BGR for consistency.
If channel_first is True, the image shape changes from (H, W, C) to (C, H, W).

Class: `StandardizeImag`

Purpose:
Normalizes image pixel values by subtracting mean and dividing by standard deviation, optionally scaling by 1/255.

Constructor Parameters:

mean (list of floats): Channel-wise mean for normalization.
std (list of floats): Channel-wise standard deviation for normalization.
is_scale (bool, default True): Whether to scale pixel values by 1/255 before normalization.
norm_type (str, default 'mean_std'): Type of normalization; currently supports 'mean_std' or 'none'.

Call Parameters:

im (np.ndarray): Input image.
im_info (dict): Dictionary containing image metadata (unchanged).

Returns:

Tuple (im, im_info) with normalized image and unchanged image info.

Usage:

std_op = StandardizeImag(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
im_norm, im_info = std_op(im, im_info)

Class: `NormalizeImage`

Purpose:
Normalizes images by subtracting mean and dividing by std with configurable scaling and channel order.

Constructor Parameters:

scale (float or str, optional): Scale factor for pixel values (default 1/255).
mean (list, optional): Mean values per channel (default [0.485, 0.456, 0.406]).
std (list, optional): Std values per channel (default [0.229, 0.224, 0.225]).
order (str, default 'chw'): Channel order, 'chw' or 'hwc'.

Call Parameters:

data (dict): Dictionary containing 'image' as PIL Image or np.ndarray.

Returns:

Modified data dict with normalized 'image' as np.ndarray.

Usage:

normalizer = NormalizeImage()
data = {'image': pil_image}
normalized_data = normalizer(data)

Class: `ToCHWImage`

Purpose:
Converts image array from HWC (Height, Width, Channels) format to CHW (Channels, Height, Width).

Call Parameters:

data (dict): Contains 'image' as PIL Image or np.ndarray.

Returns:

Modified data with 'image' transposed to CHW format.

Class: `KeepKeys`

Purpose:
Extracts and returns a list of values corresponding to specified keys from a dictionary.

Constructor Parameters:

keep_keys (list of str): Keys to keep in the output.

Call Parameters:

data (dict): Input dictionary.

Returns:

List of values from data corresponding to the specified keys.

Usage:

keep_keys_op = KeepKeys(['image', 'label'])
values = keep_keys_op(data)

Class: `Pad`

Purpose:
Pads an image to a target size or to the nearest multiple of a stride size.

Constructor Parameters:

size (int or list/tuple, optional): Target size [height, width] to pad to.
size_div (int, default 32): If size is None, pad to multiples of this value.

Call Parameters:

data (dict): Contains 'image' as np.ndarray.

Returns:

Modified data with padded 'image'.

Class: `LinearResize`

Purpose:
Resizes an image to a target size with optional aspect ratio preservation.

Constructor Parameters:

target_size (int or list of two ints): Target size [height, width].
keep_ratio (bool, default True): Whether to preserve aspect ratio.
interp (int, default cv2.INTER_LINEAR): Interpolation method.

Call Parameters:

im (np.ndarray): Input image.
im_info (dict): Image metadata.

Returns:

Tuple (im, im_info) with resized image and updated metadata including new shape and scale factor.

Key Algorithm:
If keep_ratio is True, scales image so that the smaller dimension matches the target, but limits the larger dimension to maximum target size.

Class: `Resize`

Purpose:
Resizes image to fixed size, optionally adjusting polygon annotations accordingly.

Constructor Parameters:

size (tuple (height, width), default (640, 640)).

Call Parameters:

data (dict): Contains 'image' and optionally 'polys' (list of polygons).

Returns:

Modified data with resized 'image' and scaled 'polys' if present.

Class: `DetResizeForTest`

Purpose:
Specialized resizing for detection tasks during testing with several resize strategies:

Resize to fixed image_shape optionally keeping ratio.
Resize with limit_side_len using min/max side length constraints.
Resize by resizing longer side to a fixed length.

Constructor Parameters:
Accepts keyword args to configure resize behavior:

image_shape (tuple): resize to exact shape.
keep_ratio (bool).
limit_side_len (int).
limit_type (str): one of 'min', 'max', 'resize_long'.
resize_long (int): for resizing longer side.

Call Parameters:

data (dict): Contains 'image'.

Returns:

Modified data with resized image and shape info [orig_h, orig_w, ratio_h, ratio_w].

Class: `E2EResizeForTest`

Purpose:
Resizes images for end-to-end OCR tasks, supporting specific dataset configurations (e.g., TotalText).

Constructor Parameters:

max_side_len (int): Maximum size for the longer side.
valid_set (str): Dataset name to trigger dataset-specific resizing logic.

Call Parameters:

data (dict): Contains 'image'.

Returns:

Modified data with resized image and shape info.

Class: `KieResize`

Purpose:
Resizes images and associated bounding boxes for Key Information Extraction (KIE) tasks.

Constructor Parameters:

img_scale (tuple/list): [max_side, min_side] for resizing.

Call Parameters:

data (dict): Contains 'image' and 'points' (bounding boxes).

Returns:

Modified data with resized image, scaled points, original image/boxes, and shape info.

Class: `SRResize`

Purpose:
Resizes images for Super-Resolution tasks, handling both low-resolution (LR) and high-resolution (HR) images.

Constructor Parameters:

imgH (int): Target image height.
imgW (int): Target image width.
down_sample_scale (int): Downsampling scale factor.
keep_ratio (bool): Whether to keep aspect ratio.
min_ratio (float): Minimum ratio allowed.
mask (bool): Whether to apply mask (unused in code).
infer_mode (bool): Whether in inference mode (skips HR image processing).

Call Parameters:

data (dict): Contains 'image_lr', 'image_hr', and 'label'.

Returns:

Modified data with resized 'img_lr' and optionally 'img_hr'.

Class: `ResizeNormalize`

Purpose:
Helper class to resize a PIL image and normalize it by scaling pixel values to [0,1] and converting to CHW format.

Constructor Parameters:

size (tuple): Target size (width, height).
interpolation: PIL interpolation method (default Image.BICUBIC).

Call Parameters:

img (PIL.Image): Input image.

Returns:

Normalized image as numpy array with shape (C, H, W).

Class: `GrayImageChannelFormat`

Purpose:
Converts a color image to grayscale single channel with optional inversion.

Constructor Parameters:

inverse (bool, default False): Whether to invert grayscale values.

Call Parameters:

data (dict): Contains 'image'.

Returns:

Modified data with grayscale 'image' of shape (1, H, W) and original image saved as 'src_image'.

Class: `Permute`

Purpose:
Permutes image dimensions from HWC to CHW.

Call Parameters:

im (np.ndarray): Image array.
im_info (dict): Associated metadata.

Returns:

Tuple (im, im_info) with permuted image.

Class: `PadStride`

Purpose:
Pads images so that height and width are multiples of a specified stride. Useful for models with Feature Pyramid Networks (FPN) that require input sizes divisible by stride.

Constructor Parameters:

stride (int): Stride value to pad to (e.g., 32, 64).

Call Parameters:

im (np.ndarray): Image array in CHW format.
im_info (dict): Metadata.

Returns:

Tuple (padded_im, im_info) with padded image.

Function: `decode_image`

Purpose:
Reads an RGB image from a file path or uses an existing image array.

Parameters:

im_file (str or np.ndarray): Image path or image array.
im_info (dict): Dictionary to store image metadata.

Returns:

Tuple (im, im_info) with image as np.ndarray and updated metadata.

Function: `preprocess`

Purpose:
Applies a sequence of preprocessing operators to an image.

Parameters:

im (str or np.ndarray): Image data or path.
preprocess_ops (list): List of callable operators/functions.

Returns:

Tuple (im, im_info) of processed image and metadata.

Function: `nms`

Purpose:
Performs Non-Maximum Suppression (NMS) on bounding boxes.

Parameters:

bboxes (np.ndarray): Array of bounding boxes [N, 4] with coordinates [x1, y1, x2, y2].
scores (np.ndarray): Confidence scores for each box.
iou_thresh (float): IoU threshold for suppression.

Returns:

List of indices of boxes that survive NMS.

Implementation Details & Algorithms

Most resize operations carefully consider aspect ratio preservation and alignment to multiples of strides (e.g., 32 or 128), which is crucial for convolutional neural networks requiring inputs divisible by certain sizes.
DetResizeForTest and E2EResizeForTest implement multiple resizing strategies tailored for detection and OCR tasks respectively, including dataset-specific handling.
Padding operations use OpenCV border padding or numpy zero-padding to ensure image sizes meet model requirements.
Normalization and standardization use numpy broadcasting for efficient channel-wise operations.
nms function implements a classical greedy NMS algorithm sorting by score, computing IoU, and suppressing overlapping boxes above a threshold.

Interaction With Other System Components

This module is typically used as part of the data preprocessing pipeline before feeding images into neural networks for tasks such as detection, recognition, and KIE.
It operates on input dictionaries that may include images and annotations (e.g., polygons for text regions).
The modular classes allow integration with data loading frameworks, enabling flexible pipelines by chaining operators.
Functions like nms are often used post-inference to filter overlapping detection boxes.
Classes like SRResize and E2EResizeForTest are specialized for super-resolution and end-to-end OCR models, indicating interaction with respective model components.

Visual Diagram: Class Structure

classDiagram
    class DecodeImage {
        -img_mode: str
        -channel_first: bool
        -ignore_orientation: bool
        +__init__(img_mode, channel_first, ignore_orientation)
        +__call__(data) dict
    }

    class StandardizeImag {
        -mean: list
        -std: list
        -is_scale: bool
        -norm_type: str
        +__init__(mean, std, is_scale, norm_type)
        +__call__(im, im_info) tuple
    }

    class NormalizeImage {
        -scale: float
        -mean: np.ndarray
        -std: np.ndarray
        +__init__(scale, mean, std, order)
        +__call__(data) dict
    }

    class ToCHWImage {
        +__init__()
        +__call__(data) dict
    }

    class KeepKeys {
        -keep_keys: list
        +__init__(keep_keys)
        +__call__(data) list
    }

    class Pad {
        -size: list
        -size_div: int
        +__init__(size, size_div)
        +__call__(data) dict
    }

    class LinearResize {
        -target_size: list
        -keep_ratio: bool
        -interp: int
        +__init__(target_size, keep_ratio, interp)
        +__call__(im, im_info) tuple
        +generate_scale(im) tuple
    }

    class Resize {
        -size: tuple
        +__init__(size)
        +resize_image(img) tuple
        +__call__(data) dict
    }

    class DetResizeForTest {
        -resize_type: int
        -keep_ratio: bool
        -image_shape: tuple
        -limit_side_len: int
        -limit_type: str
        -resize_long: int
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image_type0(img) tuple
        +resize_image_type1(img) tuple
        +resize_image_type2(img) tuple
        +image_padding(im, value) np.ndarray
    }

    class E2EResizeForTest {
        -max_side_len: int
        -valid_set: str
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image_for_totaltext(im, max_side_len) tuple
        +resize_image(im, max_side_len) tuple
    }

    class KieResize {
        -max_side: int
        -min_side: int
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image(img) tuple
        +resize_boxes(im, points, scale_factor) np.ndarray
    }

    class SRResize {
        -imgH: int
        -imgW: int
        -down_sample_scale: int
        -keep_ratio: bool
        -min_ratio: float
        -mask: bool
        -infer_mode: bool
        +__init__(imgH, imgW, down_sample_scale, keep_ratio, min_ratio, mask, infer_mode)
        +__call__(data) dict
    }

    class ResizeNormalize {
        -size: tuple
        -interpolation
        +__init__(size, interpolation)
        +__call__(img) np.ndarray
    }

    class GrayImageChannelFormat {
        -inverse: bool
        +__init__(inverse)
        +__call__(data) dict
    }

    class Permute {
        +__init__()
        +__call__(im, im_info) tuple
    }

    class PadStride {
        -coarsest_stride: int
        +__init__(stride)
        +__call__(im, im_info) tuple
    }

    DecodeImage ..> np.ndarray
    StandardizeImag ..> np.ndarray
    NormalizeImage ..> np.ndarray
    ToCHWImage ..> np.ndarray
    KeepKeys ..> list
    Pad ..> np.ndarray
    LinearResize ..> np.ndarray
    Resize ..> np.ndarray
    DetResizeForTest ..> np.ndarray
    E2EResizeForTest ..> np.ndarray
    KieResize ..> np.ndarray
    SRResize ..>

operators.py

Overview

Classes and Functions

Class: DecodeImage

Class: StandardizeImag

Class: NormalizeImage

Class: ToCHWImage

Class: KeepKeys

Class: Pad

Class: LinearResize

Class: Resize

Class: DetResizeForTest

Class: E2EResizeForTest

Class: KieResize

Class: SRResize

Class: ResizeNormalize

Class: GrayImageChannelFormat

Class: Permute

Class: PadStride

Function: decode_image

Function: preprocess

Function: nms

Implementation Details & Algorithms

Interaction With Other System Components

Visual Diagram: Class Structure

Class: `DecodeImage`

Class: `StandardizeImag`

Class: `NormalizeImage`

Class: `ToCHWImage`

Class: `KeepKeys`

Class: `Pad`

Class: `LinearResize`

Class: `Resize`

Class: `DetResizeForTest`

Class: `E2EResizeForTest`

Class: `KieResize`

Class: `SRResize`

Class: `ResizeNormalize`

Class: `GrayImageChannelFormat`

Class: `Permute`

Class: `PadStride`

Function: `decode_image`

Function: `preprocess`

Function: `nms`