postprocess.py


Overview

postprocess.py is a utility module providing post-processing tools for text detection and recognition tasks, primarily targeting OCR systems. It implements algorithms to convert raw model outputs (such as segmentation maps or predicted character probabilities) into structured, human-readable results like bounding boxes around detected text regions and decoded text strings.

This file contains two main post-processing components:

Additionally, a factory function build_post_process dynamically instantiates these post-processing classes based on configuration.


Classes and Functions

build_post_process(config, global_config=None)

Factory function to create post-processing instances based on configuration.

config = {"name": "DBPostProcess", "thresh": 0.3}
postprocess = build_post_process(config)

class DBPostProcess

Post-processing for Differentiable Binarization (DB) text detection models. Converts binarized segmentation maps into text bounding boxes with confidence scores.

Constructor Parameters

Important properties

Methods

Usage example

db_postprocess = DBPostProcess(thresh=0.3, box_thresh=0.7)
results = db_postprocess(outs_dict={'maps': pred_maps}, shape_list=[(720, 1280, 1.0, 1.0)])
for res in results:
    print(res['points'])  # list of detected boxes

class BaseRecLabelDecode

Base class for decoding text recognition outputs, converting between label indices and text strings.

Constructor Parameters

Important attributes

Methods

Usage example

decoder = BaseRecLabelDecode(character_dict_path="characters.txt")
text_results = decoder.decode([[1, 2, 3]], text_prob=[[0.9, 0.8, 0.95]])
print(text_results)  # [('abc', 0.8833)]

class CTCLabelDecode(BaseRecLabelDecode)

Derived class specialized for CTC-based recognition output decoding.

Constructor Parameters

Methods

Usage example

ctc_decoder = CTCLabelDecode(character_dict_path="alphabet.txt")
decoded_text = ctc_decoder(preds=model_outputs)
print(decoded_text)

Implementation Details and Algorithms


Interaction with Other System Components


Visual Diagram: Flowchart of Main Functions in DBPostProcess

flowchart TD
    A[Input: Prediction Map] --> B{Binarize with thresh}
    B --> C[Find Contours]
    C --> D{Limit to max_candidates}
    D --> E[For each contour]
    E --> F[Approximate Polygon or MinAreaRect]
    F --> G[Calculate Box Score (fast or slow)]
    G --> H{Score > box_thresh?}
    H -- Yes --> I[Unclip polygon to expand box]
    I --> J[Filter by min size]
    J --> K[Scale box to original image size]
    K --> L[Append box and score]
    H -- No --> M[Discard box]
    L --> N[Output: List of boxes and scores]

Summary

The postprocess.py file provides essential post-processing utilities for OCR pipelines, converting deep learning model outputs into actionable text detection and recognition results. Its modular design and configurable parameters make it suitable for various OCR architectures, enabling precise text localization and decoding.