t_recognizer.py


Overview

t_recognizer.py is a command-line tool designed to perform document image analysis focusing on two key tasks:

  1. Layout Recognition: Identifying and classifying different regions or components within document images, such as paragraphs, titles, figures, tables, etc.

  2. Table Structure Recognition (TSR): Specifically detecting table components (columns, headers, rows, spanning cells) and reconstructing table structures from images.

The script leverages pre-trained deep learning models from the deepdoc library for layout and table structure recognition, and integrates Optical Character Recognition (OCR) to extract textual content for reconstructing tables as HTML.

It supports batch processing of images or PDFs from input directories or individual files, outputs visualized results with bounding boxes, and for the TSR mode, generates corresponding HTML files representing the detected table layouts.


Detailed Description

Main Functionalities


Classes and Functions

1. main(args)

Purpose:
Entry function that processes input images, runs the selected recognition mode, visualizes and saves the results.

Parameters:

Returns:

Functionality:

Usage Example:

python t_recognizer.py --inputs ./docs/sample.pdf --output_dir ./results --mode tsr --threshold 0.5

2. get_table_html(img, tb_cpns, ocr)

Purpose:
Generates an HTML representation of a table detected within an image using OCR text extraction and component layout analysis.

Parameters:

Returns:

Implementation Details:

Usage Context:
Called internally during TSR mode processing to produce user-readable table outputs.


Important Implementation Details


Interaction with Other System Components


Execution Flow Diagram

flowchart TD
    A[Start: Parse CLI Args] --> B[init_in_out: Load images and outputs]
    B --> C{Mode?}
    C -->|layout| D[Init LayoutRecognizer]
    C -->|tsr| E[Init TableStructureRecognizer & OCR]
    D --> F[Run layout.forward(images, threshold)]
    E --> G[Run tsr(images, threshold)]
    G --> H[get_table_html for each image]
    F --> I[Draw bounding boxes on images]
    H --> I
    I --> J[Save annotated images]
    H --> K[Save HTML files (TSR mode only)]
    J --> L[Log output paths]
    K --> L
    L --> M[End]

Class Diagram of Key Imported Models (Conceptual)

classDiagram
    class LayoutRecognizer {
        +forward(images, thr)
        +sort_Y_firstly(layouts, fuzzy)
        +layouts_cleanup(boxes, layouts, margin, portion)
        +find_overlapped_with_threshold(box, layouts, thr)
        +find_horizontally_tightest_fit(box, layouts)
        +labels
    }

    class TableStructureRecognizer {
        +__call__(images, thr)
        +construct_table(boxes, html)
    }

    class OCR {
        +__call__(image_array)
    }

    t_recognizer.py --> LayoutRecognizer
    t_recognizer.py --> TableStructureRecognizer
    t_recognizer.py --> OCR

Summary

t_recognizer.py is a specialized utility script in the InfiniFlow project for advanced document image analysis, focusing on layout and table structure recognition. It provides an end-to-end pipeline from input image loading, detection, OCR text extraction, visualization, to output generation including HTML tables. The script is designed to be extensible and relies heavily on the deepdoc library’s vision models and utilities.


Appendix: Command Line Arguments

Argument

Description

Default

Required

--inputs

Input file or directory path containing images or PDFs

None

Yes

--output_dir

Directory to save output images and HTML

./layouts_outputs

No

--threshold

Confidence threshold for filtering detections

0.5

No

--mode

Task mode: "layout" for layout recognition or "tsr" for tables

"layout"

No


Example Usage

python t_recognizer.py --inputs ./sample_docs --output_dir ./output --mode tsr --threshold 0.6

This command processes all images/PDFs in ./sample_docs with table structure recognition at a 0.6 confidence threshold, saving annotated images and HTML files to ./output.


End of Documentation