operators.py


Overview

operators.py is a utility module primarily focused on image preprocessing operations commonly used in computer vision workflows, especially those related to deep learning for tasks like detection, recognition, and segmentation. The file provides a collection of classes and functions to decode, resize, normalize, pad, format, and otherwise manipulate images and associated annotations. These operators are designed to be modular and composable, allowing for flexible preprocessing pipelines.

Key functionalities include:


Classes and Functions

Class: DecodeImage

Purpose:
Decodes an image from a byte buffer and converts it to the desired color mode and channel order.

Constructor Parameters:

Usage:

decoder = DecodeImage(img_mode='RGB', channel_first=True)
data = {'image': image_bytes}
decoded_data = decoder(data)
img = decoded_data['image']  # numpy array, shape depends on channel_first

Details:


Class: StandardizeImag

Purpose:
Normalizes image pixel values by subtracting mean and dividing by standard deviation, optionally scaling by 1/255.

Constructor Parameters:

Call Parameters:

Returns:

Usage:

std_op = StandardizeImag(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
im_norm, im_info = std_op(im, im_info)

Class: NormalizeImage

Purpose:
Normalizes images by subtracting mean and dividing by std with configurable scaling and channel order.

Constructor Parameters:

Call Parameters:

Returns:

Usage:

normalizer = NormalizeImage()
data = {'image': pil_image}
normalized_data = normalizer(data)

Class: ToCHWImage

Purpose:
Converts image array from HWC (Height, Width, Channels) format to CHW (Channels, Height, Width).

Call Parameters:

Returns:


Class: KeepKeys

Purpose:
Extracts and returns a list of values corresponding to specified keys from a dictionary.

Constructor Parameters:

Call Parameters:

Returns:

Usage:

keep_keys_op = KeepKeys(['image', 'label'])
values = keep_keys_op(data)

Class: Pad

Purpose:
Pads an image to a target size or to the nearest multiple of a stride size.

Constructor Parameters:

Call Parameters:

Returns:


Class: LinearResize

Purpose:
Resizes an image to a target size with optional aspect ratio preservation.

Constructor Parameters:

Call Parameters:

Returns:

Key Algorithm:
If keep_ratio is True, scales image so that the smaller dimension matches the target, but limits the larger dimension to maximum target size.


Class: Resize

Purpose:
Resizes image to fixed size, optionally adjusting polygon annotations accordingly.

Constructor Parameters:

Call Parameters:

Returns:


Class: DetResizeForTest

Purpose:
Specialized resizing for detection tasks during testing with several resize strategies:

Constructor Parameters:
Accepts keyword args to configure resize behavior:

Call Parameters:

Returns:


Class: E2EResizeForTest

Purpose:
Resizes images for end-to-end OCR tasks, supporting specific dataset configurations (e.g., TotalText).

Constructor Parameters:

Call Parameters:

Returns:


Class: KieResize

Purpose:
Resizes images and associated bounding boxes for Key Information Extraction (KIE) tasks.

Constructor Parameters:

Call Parameters:

Returns:


Class: SRResize

Purpose:
Resizes images for Super-Resolution tasks, handling both low-resolution (LR) and high-resolution (HR) images.

Constructor Parameters:

Call Parameters:

Returns:


Class: ResizeNormalize

Purpose:
Helper class to resize a PIL image and normalize it by scaling pixel values to [0,1] and converting to CHW format.

Constructor Parameters:

Call Parameters:

Returns:


Class: GrayImageChannelFormat

Purpose:
Converts a color image to grayscale single channel with optional inversion.

Constructor Parameters:

Call Parameters:

Returns:


Class: Permute

Purpose:
Permutes image dimensions from HWC to CHW.

Call Parameters:

Returns:


Class: PadStride

Purpose:
Pads images so that height and width are multiples of a specified stride. Useful for models with Feature Pyramid Networks (FPN) that require input sizes divisible by stride.

Constructor Parameters:

Call Parameters:

Returns:


Function: decode_image

Purpose:
Reads an RGB image from a file path or uses an existing image array.

Parameters:

Returns:


Function: preprocess

Purpose:
Applies a sequence of preprocessing operators to an image.

Parameters:

Returns:


Function: nms

Purpose:
Performs Non-Maximum Suppression (NMS) on bounding boxes.

Parameters:

Returns:


Implementation Details & Algorithms


Interaction With Other System Components


Visual Diagram: Class Structure

classDiagram
    class DecodeImage {
        -img_mode: str
        -channel_first: bool
        -ignore_orientation: bool
        +__init__(img_mode, channel_first, ignore_orientation)
        +__call__(data) dict
    }

    class StandardizeImag {
        -mean: list
        -std: list
        -is_scale: bool
        -norm_type: str
        +__init__(mean, std, is_scale, norm_type)
        +__call__(im, im_info) tuple
    }

    class NormalizeImage {
        -scale: float
        -mean: np.ndarray
        -std: np.ndarray
        +__init__(scale, mean, std, order)
        +__call__(data) dict
    }

    class ToCHWImage {
        +__init__()
        +__call__(data) dict
    }

    class KeepKeys {
        -keep_keys: list
        +__init__(keep_keys)
        +__call__(data) list
    }

    class Pad {
        -size: list
        -size_div: int
        +__init__(size, size_div)
        +__call__(data) dict
    }

    class LinearResize {
        -target_size: list
        -keep_ratio: bool
        -interp: int
        +__init__(target_size, keep_ratio, interp)
        +__call__(im, im_info) tuple
        +generate_scale(im) tuple
    }

    class Resize {
        -size: tuple
        +__init__(size)
        +resize_image(img) tuple
        +__call__(data) dict
    }

    class DetResizeForTest {
        -resize_type: int
        -keep_ratio: bool
        -image_shape: tuple
        -limit_side_len: int
        -limit_type: str
        -resize_long: int
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image_type0(img) tuple
        +resize_image_type1(img) tuple
        +resize_image_type2(img) tuple
        +image_padding(im, value) np.ndarray
    }

    class E2EResizeForTest {
        -max_side_len: int
        -valid_set: str
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image_for_totaltext(im, max_side_len) tuple
        +resize_image(im, max_side_len) tuple
    }

    class KieResize {
        -max_side: int
        -min_side: int
        +__init__(**kwargs)
        +__call__(data) dict
        +resize_image(img) tuple
        +resize_boxes(im, points, scale_factor) np.ndarray
    }

    class SRResize {
        -imgH: int
        -imgW: int
        -down_sample_scale: int
        -keep_ratio: bool
        -min_ratio: float
        -mask: bool
        -infer_mode: bool
        +__init__(imgH, imgW, down_sample_scale, keep_ratio, min_ratio, mask, infer_mode)
        +__call__(data) dict
    }

    class ResizeNormalize {
        -size: tuple
        -interpolation
        +__init__(size, interpolation)
        +__call__(img) np.ndarray
    }

    class GrayImageChannelFormat {
        -inverse: bool
        +__init__(inverse)
        +__call__(data) dict
    }

    class Permute {
        +__init__()
        +__call__(im, im_info) tuple
    }

    class PadStride {
        -coarsest_stride: int
        +__init__(stride)
        +__call__(im, im_info) tuple
    }

    DecodeImage ..> np.ndarray
    StandardizeImag ..> np.ndarray
    NormalizeImage ..> np.ndarray
    ToCHWImage ..> np.ndarray
    KeepKeys ..> list
    Pad ..> np.ndarray
    LinearResize ..> np.ndarray
    Resize ..> np.ndarray
    DetResizeForTest ..> np.ndarray
    E2EResizeForTest ..> np.ndarray
    KieResize ..> np.ndarray
    SRResize ..>