operators.py
Overview
operators.py is a utility module primarily focused on image preprocessing operations commonly used in computer vision workflows, especially those related to deep learning for tasks like detection, recognition, and segmentation. The file provides a collection of classes and functions to decode, resize, normalize, pad, format, and otherwise manipulate images and associated annotations. These operators are designed to be modular and composable, allowing for flexible preprocessing pipelines.
Key functionalities include:
Image decoding from raw bytes or file paths.
Various resizing strategies preserving aspect ratio or scaling to fixed dimensions.
Image normalization and standardization.
Conversion between different image channel formats (HWC ↔ CHW, RGB ↔ BGR).
Padding images to desired sizes or strides.
Utility functions like Non-Maximum Suppression (NMS).
Support for specialized preprocessing for tasks such as end-to-end OCR and key information extraction (KIE).
Classes and Functions
Class: DecodeImage
Purpose:
Decodes an image from a byte buffer and converts it to the desired color mode and channel order.
Constructor Parameters:
img_mode(str, default'RGB'): Desired output color space. Supported:'RGB','GRAY'.channel_first(bool, defaultFalse): Whether to transpose image channels from HWC to CHW.ignore_orientation (bool, default
False): Whether to ignore EXIF orientation metadata when decoding.
Usage:
decoder = DecodeImage(img_mode='RGB', channel_first=True)
data = {'image': image_bytes}
decoded_data = decoder(data)
img = decoded_data['image'] # numpy array, shape depends on channel_first
Details:
Uses
cv2.imdecodeto decode images from bytes.Converts BGR (OpenCV default) to RGB if
img_modeis'RGB'.For grayscale mode, converts gray image to 3-channel BGR for consistency.
If
channel_firstisTrue, the image shape changes from (H, W, C) to (C, H, W).
Class: StandardizeImag
Purpose:
Normalizes image pixel values by subtracting mean and dividing by standard deviation, optionally scaling by 1/255.
Constructor Parameters:
mean(list of floats): Channel-wise mean for normalization.std(list of floats): Channel-wise standard deviation for normalization.is_scale(bool, defaultTrue): Whether to scale pixel values by 1/255 before normalization.norm_type(str, default'mean_std'): Type of normalization; currently supports'mean_std'or'none'.
Call Parameters:
im(np.ndarray): Input image.im_info(dict): Dictionary containing image metadata (unchanged).
Returns:
Tuple
(im, im_info)with normalized image and unchanged image info.
Usage:
std_op = StandardizeImag(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
im_norm, im_info = std_op(im, im_info)
Class: NormalizeImage
Purpose:
Normalizes images by subtracting mean and dividing by std with configurable scaling and channel order.
Constructor Parameters:
scale(float or str, optional): Scale factor for pixel values (default1/255).mean(list, optional): Mean values per channel (default[0.485, 0.456, 0.406]).std(list, optional): Std values per channel (default[0.229, 0.224, 0.225]).order(str, default'chw'): Channel order,'chw'or'hwc'.
Call Parameters:
data(dict): Dictionary containing'image'as PIL Image or np.ndarray.
Returns:
Modified
datadict with normalized'image'as np.ndarray.
Usage:
normalizer = NormalizeImage()
data = {'image': pil_image}
normalized_data = normalizer(data)
Class: ToCHWImage
Purpose:
Converts image array from HWC (Height, Width, Channels) format to CHW (Channels, Height, Width).
Call Parameters:
data(dict): Contains'image'as PIL Image or np.ndarray.
Returns:
Modified
datawith'image'transposed to CHW format.
Class: KeepKeys
Purpose:
Extracts and returns a list of values corresponding to specified keys from a dictionary.
Constructor Parameters:
keep_keys(list of str): Keys to keep in the output.
Call Parameters:
data(dict): Input dictionary.
Returns:
List of values from
datacorresponding to the specified keys.
Usage:
keep_keys_op = KeepKeys(['image', 'label'])
values = keep_keys_op(data)
Class: Pad
Purpose:
Pads an image to a target size or to the nearest multiple of a stride size.
Constructor Parameters:
size(int or list/tuple, optional): Target size[height, width]to pad to.size_div(int, default 32): Ifsizeis None, pad to multiples of this value.
Call Parameters:
data(dict): Contains'image'as np.ndarray.
Returns:
Modified
datawith padded'image'.
Class: LinearResize
Purpose:
Resizes an image to a target size with optional aspect ratio preservation.
Constructor Parameters:
target_size(int or list of two ints): Target size[height, width].keep_ratio(bool, default True): Whether to preserve aspect ratio.interp(int, defaultcv2.INTER_LINEAR): Interpolation method.
Call Parameters:
im(np.ndarray): Input image.im_info(dict): Image metadata.
Returns:
Tuple
(im, im_info)with resized image and updated metadata including new shape and scale factor.
Key Algorithm:
If keep_ratio is True, scales image so that the smaller dimension matches the target, but limits the larger dimension to maximum target size.
Class: Resize
Purpose:
Resizes image to fixed size, optionally adjusting polygon annotations accordingly.
Constructor Parameters:
size(tuple (height, width), default (640, 640)).
Call Parameters:
data(dict): Contains'image'and optionally'polys'(list of polygons).
Returns:
Modified
datawith resized'image'and scaled'polys'if present.
Class: DetResizeForTest
Purpose:
Specialized resizing for detection tasks during testing with several resize strategies:
Resize to fixed
image_shapeoptionally keeping ratio.Resize with
limit_side_lenusing min/max side length constraints.Resize by resizing longer side to a fixed length.
Constructor Parameters:
Accepts keyword args to configure resize behavior:
image_shape(tuple): resize to exact shape.keep_ratio(bool).limit_side_len(int).limit_type(str): one of'min','max','resize_long'.resize_long(int): for resizing longer side.
Call Parameters:
data(dict): Contains'image'.
Returns:
Modified
datawith resized image andshapeinfo[orig_h, orig_w, ratio_h, ratio_w].
Class: E2EResizeForTest
Purpose:
Resizes images for end-to-end OCR tasks, supporting specific dataset configurations (e.g., TotalText).
Constructor Parameters:
max_side_len(int): Maximum size for the longer side.valid_set(str): Dataset name to trigger dataset-specific resizing logic.
Call Parameters:
data(dict): Contains'image'.
Returns:
Modified
datawith resized image and shape info.
Class: KieResize
Purpose:
Resizes images and associated bounding boxes for Key Information Extraction (KIE) tasks.
Constructor Parameters:
img_scale(tuple/list):[max_side, min_side]for resizing.
Call Parameters:
data(dict): Contains'image'and'points'(bounding boxes).
Returns:
Modified
datawith resized image, scaled points, original image/boxes, and shape info.
Class: SRResize
Purpose:
Resizes images for Super-Resolution tasks, handling both low-resolution (LR) and high-resolution (HR) images.
Constructor Parameters:
imgH(int): Target image height.imgW(int): Target image width.down_sample_scale(int): Downsampling scale factor.keep_ratio(bool): Whether to keep aspect ratio.min_ratio(float): Minimum ratio allowed.mask(bool): Whether to apply mask (unused in code).infer_mode(bool): Whether in inference mode (skips HR image processing).
Call Parameters:
data(dict): Contains'image_lr','image_hr', and'label'.
Returns:
Modified data with resized
'img_lr'and optionally'img_hr'.
Class: ResizeNormalize
Purpose:
Helper class to resize a PIL image and normalize it by scaling pixel values to [0,1] and converting to CHW format.
Constructor Parameters:
size(tuple): Target size(width, height).interpolation: PIL interpolation method (defaultImage.BICUBIC).
Call Parameters:
img(PIL.Image): Input image.
Returns:
Normalized image as numpy array with shape (C, H, W).
Class: GrayImageChannelFormat
Purpose:
Converts a color image to grayscale single channel with optional inversion.
Constructor Parameters:
inverse(bool, default False): Whether to invert grayscale values.
Call Parameters:
data(dict): Contains'image'.
Returns:
Modified
datawith grayscale'image'of shape (1, H, W) and original image saved as'src_image'.
Class: Permute
Purpose:
Permutes image dimensions from HWC to CHW.
Call Parameters:
im(np.ndarray): Image array.im_info(dict): Associated metadata.
Returns:
Tuple
(im, im_info)with permuted image.
Class: PadStride
Purpose:
Pads images so that height and width are multiples of a specified stride. Useful for models with Feature Pyramid Networks (FPN) that require input sizes divisible by stride.
Constructor Parameters:
stride(int): Stride value to pad to (e.g., 32, 64).
Call Parameters:
im(np.ndarray): Image array in CHW format.im_info(dict): Metadata.
Returns:
Tuple
(padded_im, im_info)with padded image.
Function: decode_image
Purpose:
Reads an RGB image from a file path or uses an existing image array.
Parameters:
im_file(str or np.ndarray): Image path or image array.im_info(dict): Dictionary to store image metadata.
Returns:
Tuple
(im, im_info)with image as np.ndarray and updated metadata.
Function: preprocess
Purpose:
Applies a sequence of preprocessing operators to an image.
Parameters:
im(str or np.ndarray): Image data or path.preprocess_ops(list): List of callable operators/functions.
Returns:
Tuple
(im, im_info)of processed image and metadata.
Function: nms
Purpose:
Performs Non-Maximum Suppression (NMS) on bounding boxes.
Parameters:
bboxes(np.ndarray): Array of bounding boxes[N, 4]with coordinates[x1, y1, x2, y2].scores(np.ndarray): Confidence scores for each box.iou_thresh(float): IoU threshold for suppression.
Returns:
List of indices of boxes that survive NMS.
Implementation Details & Algorithms
Most resize operations carefully consider aspect ratio preservation and alignment to multiples of strides (e.g., 32 or 128), which is crucial for convolutional neural networks requiring inputs divisible by certain sizes.
DetResizeForTestandE2EResizeForTestimplement multiple resizing strategies tailored for detection and OCR tasks respectively, including dataset-specific handling.Padding operations use OpenCV border padding or numpy zero-padding to ensure image sizes meet model requirements.
Normalization and standardization use numpy broadcasting for efficient channel-wise operations.
nmsfunction implements a classical greedy NMS algorithm sorting by score, computing IoU, and suppressing overlapping boxes above a threshold.
Interaction With Other System Components
This module is typically used as part of the data preprocessing pipeline before feeding images into neural networks for tasks such as detection, recognition, and KIE.
It operates on input dictionaries that may include images and annotations (e.g., polygons for text regions).
The modular classes allow integration with data loading frameworks, enabling flexible pipelines by chaining operators.
Functions like
nmsare often used post-inference to filter overlapping detection boxes.Classes like
SRResizeandE2EResizeForTestare specialized for super-resolution and end-to-end OCR models, indicating interaction with respective model components.
Visual Diagram: Class Structure
classDiagram
class DecodeImage {
-img_mode: str
-channel_first: bool
-ignore_orientation: bool
+__init__(img_mode, channel_first, ignore_orientation)
+__call__(data) dict
}
class StandardizeImag {
-mean: list
-std: list
-is_scale: bool
-norm_type: str
+__init__(mean, std, is_scale, norm_type)
+__call__(im, im_info) tuple
}
class NormalizeImage {
-scale: float
-mean: np.ndarray
-std: np.ndarray
+__init__(scale, mean, std, order)
+__call__(data) dict
}
class ToCHWImage {
+__init__()
+__call__(data) dict
}
class KeepKeys {
-keep_keys: list
+__init__(keep_keys)
+__call__(data) list
}
class Pad {
-size: list
-size_div: int
+__init__(size, size_div)
+__call__(data) dict
}
class LinearResize {
-target_size: list
-keep_ratio: bool
-interp: int
+__init__(target_size, keep_ratio, interp)
+__call__(im, im_info) tuple
+generate_scale(im) tuple
}
class Resize {
-size: tuple
+__init__(size)
+resize_image(img) tuple
+__call__(data) dict
}
class DetResizeForTest {
-resize_type: int
-keep_ratio: bool
-image_shape: tuple
-limit_side_len: int
-limit_type: str
-resize_long: int
+__init__(**kwargs)
+__call__(data) dict
+resize_image_type0(img) tuple
+resize_image_type1(img) tuple
+resize_image_type2(img) tuple
+image_padding(im, value) np.ndarray
}
class E2EResizeForTest {
-max_side_len: int
-valid_set: str
+__init__(**kwargs)
+__call__(data) dict
+resize_image_for_totaltext(im, max_side_len) tuple
+resize_image(im, max_side_len) tuple
}
class KieResize {
-max_side: int
-min_side: int
+__init__(**kwargs)
+__call__(data) dict
+resize_image(img) tuple
+resize_boxes(im, points, scale_factor) np.ndarray
}
class SRResize {
-imgH: int
-imgW: int
-down_sample_scale: int
-keep_ratio: bool
-min_ratio: float
-mask: bool
-infer_mode: bool
+__init__(imgH, imgW, down_sample_scale, keep_ratio, min_ratio, mask, infer_mode)
+__call__(data) dict
}
class ResizeNormalize {
-size: tuple
-interpolation
+__init__(size, interpolation)
+__call__(img) np.ndarray
}
class GrayImageChannelFormat {
-inverse: bool
+__init__(inverse)
+__call__(data) dict
}
class Permute {
+__init__()
+__call__(im, im_info) tuple
}
class PadStride {
-coarsest_stride: int
+__init__(stride)
+__call__(im, im_info) tuple
}
DecodeImage ..> np.ndarray
StandardizeImag ..> np.ndarray
NormalizeImage ..> np.ndarray
ToCHWImage ..> np.ndarray
KeepKeys ..> list
Pad ..> np.ndarray
LinearResize ..> np.ndarray
Resize ..> np.ndarray
DetResizeForTest ..> np.ndarray
E2EResizeForTest ..> np.ndarray
KieResize ..> np.ndarray
SRResize ..>