postprocess.py
Overview
postprocess.py is a utility module providing post-processing tools for text detection and recognition tasks, primarily targeting OCR systems. It implements algorithms to convert raw model outputs (such as segmentation maps or predicted character probabilities) into structured, human-readable results like bounding boxes around detected text regions and decoded text strings.
This file contains two main post-processing components:
DBPostProcess: Processes text detection output from Differentiable Binarization (DB) models, extracting polygon or quadrilateral bounding boxes from binarized segmentation maps.CTCLabelDecode: Decodes character probabilities from CTC-based recognition models into readable text strings.
Additionally, a factory function build_post_process dynamically instantiates these post-processing classes based on configuration.
Classes and Functions
build_post_process(config, global_config=None)
Factory function to create post-processing instances based on configuration.
Parameters:
config(dict): Configuration dictionary containing at least the key"name"specifying the post-processing class to instantiate. Additional keys are passed as parameters to the class constructor.global_config (
dict, optional): Global configuration parameters merged intoconfig.
Returns:
Instance of the specified post-processing class, orNoneif"name"is"None".Raises:
ValueErrorif the specified post-processing name is unsupported.Usage example:
config = {"name": "DBPostProcess", "thresh": 0.3}
postprocess = build_post_process(config)
class DBPostProcess
Post-processing for Differentiable Binarization (DB) text detection models. Converts binarized segmentation maps into text bounding boxes with confidence scores.
Constructor Parameters
thresh(float, default=0.3): Threshold to binarize the prediction maps.box_thresh(float, default=0.7): Minimum confidence score to keep a box.max_candidates(int, default=1000): Maximum number of contours to consider.unclip_ratio(float, default=2.0): Ratio used to expand detected polygons.use_dilation(bool, default=False): Whether to apply dilation to the binary mask before contour extraction.score_mode(str, default="fast"): Scoring method for boxes, either"fast"or"slow".box_type(str, default='quad'): Type of box output; either"quad"for quadrilateral or"poly"for polygon.
Important properties
min_size(int): Minimum size of boxes to keep (3 by default).dilation_kernel: Kernel used for dilation ifuse_dilationis True.
Methods
call(outs_dict, shape_list) -> list[dict]
Processes a batch of model output maps and returns detected boxes per image.Parameters:
outs_dict(dict): Contains model outputs, expects'maps'key with prediction maps.shape_list(listof tuples): List of tuples(src_h, src_w, ratio_h, ratio_w)for each image, representing original dimensions and scaling ratios.
Returns:
List of dictionaries, each with a'points'key containing detected boxes (list of points).
polygons_from_bitmap(pred, bitmap, dest_width, dest_height) -> (list, list)
Extracts polygon boxes from a binarized bitmap using contours and polygon approximation.boxes_from_bitmap(pred, bitmap, dest_width, dest_height) -> (np.ndarray, list)
Extracts quadrilateral boxes from a binarized bitmap using contours and minimum area rectangles.unclip(box, unclip_ratio) -> np.ndarray
Expands a polygon by a distance based on its area and perimeter to separate close text regions.get_mini_boxes(contour) -> (list, float)
Returns the four points of the minimum bounding box around a contour and the shortest side length.box_score_fast(bitmap, box) -> float
Calculates the mean score inside a polygon using a fast bounding box-based mask.box_score_slow(bitmap, contour) -> float
Calculates the mean score inside a polygon using an exact polygon mask (slower).
Usage example
db_postprocess = DBPostProcess(thresh=0.3, box_thresh=0.7)
results = db_postprocess(outs_dict={'maps': pred_maps}, shape_list=[(720, 1280, 1.0, 1.0)])
for res in results:
print(res['points']) # list of detected boxes
class BaseRecLabelDecode
Base class for decoding text recognition outputs, converting between label indices and text strings.
Constructor Parameters
character_dict_path(str, optional): Path to a file containing character dictionary, one character per line.use_space_char(bool, default=False): Whether to include space character in the dictionary.
Important attributes
beg_strandend_str: Special tokens marking start and end of sequences (not heavily used here).reverse(bool): If True, reverses the decoded text (used for languages like Arabic).character(list): List of characters in the dictionary.dict(dict): Mapping from character to index.
Methods
decode(text_index, text_prob=None, is_remove_duplicate=False) -> list[tuple]
Converts batches of indices to text strings and average confidence scores.pred_reverse(pred) -> str
Reverses the predicted text string, grouping alphanumeric sequences.add_special_char(dict_character) -> list
Hook to add special characters to the dictionary; default is no change.get_ignored_tokens() -> list
Returns tokens to ignore during decoding (default is[0]for CTC blank token).
Usage example
decoder = BaseRecLabelDecode(character_dict_path="characters.txt")
text_results = decoder.decode([[1, 2, 3]], text_prob=[[0.9, 0.8, 0.95]])
print(text_results) # [('abc', 0.8833)]
class CTCLabelDecode(BaseRecLabelDecode)
Derived class specialized for CTC-based recognition output decoding.
Constructor Parameters
Inherits from
BaseRecLabelDecode.
Methods
call(preds, label=None, *args, **kwargs) -> list or tuple
Decodes predicted probabilities into text strings, optionally decoding ground truth labels.Parameters:
preds(np.ndarrayor similar): Model prediction tensor (batch, sequence_length, num_classes).label(np.ndarray, optional): Ground truth labels to decode.
Returns:
Decoded text strings or tuple(decoded_preds, decoded_labels)iflabelprovided.
Overrides
add_special_charto add'blank'token at index 0.
Usage example
ctc_decoder = CTCLabelDecode(character_dict_path="alphabet.txt")
decoded_text = ctc_decoder(preds=model_outputs)
print(decoded_text)
Implementation Details and Algorithms
DBPostProcess uses OpenCV's contour finding and polygon approximation to extract text regions from binarized segmentation maps. It applies an "unclip" operation that expands polygons based on their geometry to better cover the text area, using the Pyclipper library for polygon offsetting.
The
box_score_fastandbox_score_slowmethods calculate confidence scores inside detected boxes either by approximating with bounding rectangles or by precise polygon masks.The
CTCLabelDecodeclass implements decoding logic for CTC outputs, including removing duplicate predictions and blank tokens, which is critical for accurate text recognition.The
BaseRecLabelDecodesupports custom character dictionaries and can handle languages requiring reversed output (e.g., Arabic) by grouping and reversing text segments.
Interaction with Other System Components
The module expects to receive raw model outputs (e.g., segmentation maps from DB models or logits from CTC recognition models) and transforms them into structured formats usable by downstream components like text layout analysis, text rendering, or final application logic layers.
build_post_processallows seamless integration with configuration-driven pipelines, enabling dynamic selection of post-processing strategies depending on the detection or recognition model used.Relies on external libraries: OpenCV for image processing, NumPy for numerical operations, Shapely and Pyclipper for polygon geometry manipulations.
Visual Diagram: Flowchart of Main Functions in DBPostProcess
flowchart TD
A[Input: Prediction Map] --> B{Binarize with thresh}
B --> C[Find Contours]
C --> D{Limit to max_candidates}
D --> E[For each contour]
E --> F[Approximate Polygon or MinAreaRect]
F --> G[Calculate Box Score (fast or slow)]
G --> H{Score > box_thresh?}
H -- Yes --> I[Unclip polygon to expand box]
I --> J[Filter by min size]
J --> K[Scale box to original image size]
K --> L[Append box and score]
H -- No --> M[Discard box]
L --> N[Output: List of boxes and scores]
Summary
The postprocess.py file provides essential post-processing utilities for OCR pipelines, converting deep learning model outputs into actionable text detection and recognition results. Its modular design and configurable parameters make it suitable for various OCR architectures, enabling precise text localization and decoding.