postprocess.py

Overview

postprocess.py is a utility module providing post-processing tools for text detection and recognition tasks, primarily targeting OCR systems. It implements algorithms to convert raw model outputs (such as segmentation maps or predicted character probabilities) into structured, human-readable results like bounding boxes around detected text regions and decoded text strings.

This file contains two main post-processing components:

DBPostProcess: Processes text detection output from Differentiable Binarization (DB) models, extracting polygon or quadrilateral bounding boxes from binarized segmentation maps.
CTCLabelDecode: Decodes character probabilities from CTC-based recognition models into readable text strings.

Additionally, a factory function build_post_process dynamically instantiates these post-processing classes based on configuration.

Classes and Functions

`build_post_process(config, global_config=None)`

Factory function to create post-processing instances based on configuration.

Parameters:
- config (dict): Configuration dictionary containing at least the key "name" specifying the post-processing class to instantiate. Additional keys are passed as parameters to the class constructor.
- global_config (dict, optional): Global configuration parameters merged into config.
Returns:
Instance of the specified post-processing class, or None if "name" is "None".
Raises:
ValueError if the specified post-processing name is unsupported.
Usage example:

config = {"name": "DBPostProcess", "thresh": 0.3}
postprocess = build_post_process(config)

`class DBPostProcess`

Post-processing for Differentiable Binarization (DB) text detection models. Converts binarized segmentation maps into text bounding boxes with confidence scores.

Constructor Parameters

thresh (float, default=0.3): Threshold to binarize the prediction maps.
box_thresh (float, default=0.7): Minimum confidence score to keep a box.
max_candidates (int, default=1000): Maximum number of contours to consider.
unclip_ratio (float, default=2.0): Ratio used to expand detected polygons.
use_dilation (bool, default=False): Whether to apply dilation to the binary mask before contour extraction.
score_mode (str, default="fast"): Scoring method for boxes, either "fast" or "slow".
box_type (str, default='quad'): Type of box output; either "quad" for quadrilateral or "poly" for polygon.

Important properties

min_size (int): Minimum size of boxes to keep (3 by default).
dilation_kernel: Kernel used for dilation if use_dilation is True.

Methods

call(outs_dict, shape_list) -> list[dict]
Processes a batch of model output maps and returns detected boxes per image.
- Parameters:
  - outs_dict (dict): Contains model outputs, expects 'maps' key with prediction maps.
  - shape_list (list of tuples): List of tuples (src_h, src_w, ratio_h, ratio_w) for each image, representing original dimensions and scaling ratios.
- Returns:
  List of dictionaries, each with a 'points' key containing detected boxes (list of points).
polygons_from_bitmap(pred, bitmap, dest_width, dest_height) -> (list, list)
Extracts polygon boxes from a binarized bitmap using contours and polygon approximation.
boxes_from_bitmap(pred, bitmap, dest_width, dest_height) -> (np.ndarray, list)
Extracts quadrilateral boxes from a binarized bitmap using contours and minimum area rectangles.
unclip(box, unclip_ratio) -> np.ndarray
Expands a polygon by a distance based on its area and perimeter to separate close text regions.
get_mini_boxes(contour) -> (list, float)
Returns the four points of the minimum bounding box around a contour and the shortest side length.
box_score_fast(bitmap, box) -> float
Calculates the mean score inside a polygon using a fast bounding box-based mask.
box_score_slow(bitmap, contour) -> float
Calculates the mean score inside a polygon using an exact polygon mask (slower).

Usage example

db_postprocess = DBPostProcess(thresh=0.3, box_thresh=0.7)
results = db_postprocess(outs_dict={'maps': pred_maps}, shape_list=[(720, 1280, 1.0, 1.0)])
for res in results:
    print(res['points'])  # list of detected boxes

`class BaseRecLabelDecode`

Base class for decoding text recognition outputs, converting between label indices and text strings.

Constructor Parameters

character_dict_path (str, optional): Path to a file containing character dictionary, one character per line.
use_space_char (bool, default=False): Whether to include space character in the dictionary.

Important attributes

beg_str and end_str: Special tokens marking start and end of sequences (not heavily used here).
reverse (bool): If True, reverses the decoded text (used for languages like Arabic).
character (list): List of characters in the dictionary.
dict (dict): Mapping from character to index.

Methods

decode(text_index, text_prob=None, is_remove_duplicate=False) -> list[tuple]
Converts batches of indices to text strings and average confidence scores.
pred_reverse(pred) -> str
Reverses the predicted text string, grouping alphanumeric sequences.
add_special_char(dict_character) -> list
Hook to add special characters to the dictionary; default is no change.
get_ignored_tokens() -> list
Returns tokens to ignore during decoding (default is [0] for CTC blank token).

Usage example

decoder = BaseRecLabelDecode(character_dict_path="characters.txt")
text_results = decoder.decode([[1, 2, 3]], text_prob=[[0.9, 0.8, 0.95]])
print(text_results)  # [('abc', 0.8833)]

`class CTCLabelDecode(BaseRecLabelDecode)`

Derived class specialized for CTC-based recognition output decoding.

Constructor Parameters

Inherits from BaseRecLabelDecode.

Methods

call(preds, label=None, *args, **kwargs) -> list or tuple
Decodes predicted probabilities into text strings, optionally decoding ground truth labels.
- Parameters:
  - preds (np.ndarray or similar): Model prediction tensor (batch, sequence_length, num_classes).
  - label (np.ndarray, optional): Ground truth labels to decode.
- Returns:
  Decoded text strings or tuple (decoded_preds, decoded_labels) if label provided.
Overrides add_special_char to add 'blank' token at index 0.

Usage example

ctc_decoder = CTCLabelDecode(character_dict_path="alphabet.txt")
decoded_text = ctc_decoder(preds=model_outputs)
print(decoded_text)

Implementation Details and Algorithms

DBPostProcess uses OpenCV's contour finding and polygon approximation to extract text regions from binarized segmentation maps. It applies an "unclip" operation that expands polygons based on their geometry to better cover the text area, using the Pyclipper library for polygon offsetting.
The box_score_fast and box_score_slow methods calculate confidence scores inside detected boxes either by approximating with bounding rectangles or by precise polygon masks.
The CTCLabelDecode class implements decoding logic for CTC outputs, including removing duplicate predictions and blank tokens, which is critical for accurate text recognition.
The BaseRecLabelDecode supports custom character dictionaries and can handle languages requiring reversed output (e.g., Arabic) by grouping and reversing text segments.

Interaction with Other System Components

The module expects to receive raw model outputs (e.g., segmentation maps from DB models or logits from CTC recognition models) and transforms them into structured formats usable by downstream components like text layout analysis, text rendering, or final application logic layers.
build_post_process allows seamless integration with configuration-driven pipelines, enabling dynamic selection of post-processing strategies depending on the detection or recognition model used.
Relies on external libraries: OpenCV for image processing, NumPy for numerical operations, Shapely and Pyclipper for polygon geometry manipulations.

Visual Diagram: Flowchart of Main Functions in `DBPostProcess`

flowchart TD
    A[Input: Prediction Map] --> B{Binarize with thresh}
    B --> C[Find Contours]
    C --> D{Limit to max_candidates}
    D --> E[For each contour]
    E --> F[Approximate Polygon or MinAreaRect]
    F --> G[Calculate Box Score (fast or slow)]
    G --> H{Score > box_thresh?}
    H -- Yes --> I[Unclip polygon to expand box]
    I --> J[Filter by min size]
    J --> K[Scale box to original image size]
    K --> L[Append box and score]
    H -- No --> M[Discard box]
    L --> N[Output: List of boxes and scores]

Summary

The postprocess.py file provides essential post-processing utilities for OCR pipelines, converting deep learning model outputs into actionable text detection and recognition results. Its modular design and configurable parameters make it suitable for various OCR architectures, enabling precise text localization and decoding.

postprocess.py

Overview

Classes and Functions

build_post_process(config, global_config=None)

class DBPostProcess

Constructor Parameters

Important properties

Methods

Usage example

class BaseRecLabelDecode

Constructor Parameters

Important attributes

Methods

Usage example

class CTCLabelDecode(BaseRecLabelDecode)

Constructor Parameters

Methods

Usage example

Implementation Details and Algorithms

Interaction with Other System Components

Visual Diagram: Flowchart of Main Functions in DBPostProcess

Summary

`build_post_process(config, global_config=None)`

`class DBPostProcess`

`class BaseRecLabelDecode`

`class CTCLabelDecode(BaseRecLabelDecode)`

Visual Diagram: Flowchart of Main Functions in `DBPostProcess`