recognizer.py


Overview

The recognizer.py file defines the Recognizer class, a core component designed for object detection and recognition tasks within the InfiniFlow project. This class loads and runs pre-trained models (likely OCR or document layout models) using ONNX Runtime and provides utilities for preprocessing input images, postprocessing model outputs, and sorting or filtering detected bounding boxes based on spatial or confidence criteria.

The module facilitates batch processing of images to detect objects or text regions, returns structured bounding box data with classification labels and confidence scores, and includes several static helper functions for spatial analysis and cleanup of detection results.


Detailed Description

Imports and Dependencies


Recognizer Class

The Recognizer class encapsulates model loading, inference, and postprocessing logic for object recognition.

Initialization

def __init__(self, label_list, task_name, model_dir=None)
labels = ["text", "title", "figure", "table"]
recognizer = Recognizer(label_list=labels, task_name="layout_detection")

Static Sorting and Spatial Utility Methods

Several static methods provide sorting of detected boxes by spatial coordinates or attributes, and compute overlap between bounding boxes.

sort_Y_firstly(arr, threshold)

sort_X_firstly(arr, threshold)

sort_C_firstly(arr, thr=0)

sort_R_firstly(arr, thr=0)

overlapped_area(a, b, ratio=True)

layouts_cleanup(boxes, layouts, far=2, thr=0.7)


Input Creation and Preprocessing

create_inputs(self, imgs, im_info)

preprocess(self, image_list)


Postprocessing

postprocess(self, boxes, inputs, thr)


Overlap and Matching Utilities


Inference and Lifecycle Methods

call(self, image_list, thr=0.7, batch_size=16)

close(self)

del(self)


Important Implementation Details


Interaction with Other Parts of the System


Usage Example

from recognizer import Recognizer

labels = ["text", "title", "table", "figure"]
recognizer = Recognizer(labels, task_name="layout_detection")

images = [cv2.imread("doc1.png"), cv2.imread("doc2.png")]
results = recognizer(images, thr=0.6, batch_size=2)

for i, res in enumerate(results):
    print(f"Image {i} detections:")
    for det in res:
        print(det)

recognizer.close()

Mermaid Class Diagram

classDiagram
    class Recognizer {
        -ort_sess
        -run_options
        -input_names
        -output_names
        -input_shape
        -label_list
        +__init__(label_list, task_name, model_dir=None)
        +__call__(image_list, thr=0.7, batch_size=16)
        +close()
        +__del__()
        +create_inputs(imgs, im_info)
        +preprocess(image_list)
        +postprocess(boxes, inputs, thr)
        +sort_Y_firstly(arr, threshold)
        +sort_X_firstly(arr, threshold)
        +sort_C_firstly(arr, thr=0)
        +sort_R_firstly(arr, thr=0)
        +overlapped_area(a, b, ratio=True)
        +layouts_cleanup(boxes, layouts, far=2, thr=0.7)
        +find_overlapped(box, boxes_sorted_by_y, naive=False)
        +find_horizontally_tightest_fit(box, boxes)
        +find_overlapped_with_threshold(box, boxes, thr=0.3)
    }

Summary

The recognizer.py file provides a robust, modular Recognizer class that loads and runs ONNX models for document layout or object detection tasks, with comprehensive preprocessing, inference batching, and postprocessing capabilities. It includes sophisticated spatial sorting and filtering utilities to refine detection outputs, and integrates tightly with the InfiniFlow project's model and operator infrastructure. This file is a foundational component enabling automated recognition workflows in the system.