recognizer.py

Overview

The recognizer.py file defines the Recognizer class, a core component designed for object detection and recognition tasks within the InfiniFlow project. This class loads and runs pre-trained models (likely OCR or document layout models) using ONNX Runtime and provides utilities for preprocessing input images, postprocessing model outputs, and sorting or filtering detected bounding boxes based on spatial or confidence criteria.

The module facilitates batch processing of images to detect objects or text regions, returns structured bounding box data with classification labels and confidence scores, and includes several static helper functions for spatial analysis and cleanup of detection results.

Detailed Description

Imports and Dependencies

gc, logging, os, math, numpy, cv2, functools.cmp_to_key
Project-specific utilities for file paths and model loading (get_project_base_directory, load_model)
Operators and preprocessing functions from the local package (operators, preprocess)

Recognizer Class

The Recognizer class encapsulates model loading, inference, and postprocessing logic for object recognition.

Initialization

def __init__(self, label_list, task_name, model_dir=None)

Purpose: Initializes the recognizer by loading the ONNX model session for a specific task.
Parameters:
- label_list (list[str]): List of class labels that the model predicts.
- task_name (str): Task identifier string used to load the correct model.
- model_dir (str, optional): Directory path where the model files are stored. Defaults to a project-relative path if not provided.
Behavior:
Loads the inference session with load_model, sets input/output node names, and records input image shape expected by the model.
Usage example:

labels = ["text", "title", "figure", "table"]
recognizer = Recognizer(label_list=labels, task_name="layout_detection")

Static Sorting and Spatial Utility Methods

Several static methods provide sorting of detected boxes by spatial coordinates or attributes, and compute overlap between bounding boxes.

sort_Y_firstly(arr, threshold)

Sorts a list of dicts by their "top" coordinate primarily, then by "x0" if vertical difference is less than threshold.
Used to order boxes top-to-bottom, then left-to-right within similar vertical bands.

sort_X_firstly(arr, threshold)

Sorts a list of dicts by "x0" (horizontal, left-to-right) primarily, then "top" if horizontal difference is less than threshold.
Used to order boxes left-to-right, then top-to-bottom within similar horizontal bands.

sort_C_firstly(arr, thr=0)

Sorts boxes primarily by attribute "C" (if present), then "top".
Uses sort_X_firstly as initial step and then restores order based on "C".

sort_R_firstly(arr, thr=0)

Sorts boxes primarily by attribute "R" (if present), then "x0".
Uses sort_Y_firstly as initial step and then restores order based on "R".

overlapped_area(a, b, ratio=True)

Computes the overlapping area between two bounding boxes a and b represented as dicts with keys "top", "bottom", "x0", "x1".
If ratio=True, returns the overlap ratio relative to box a's area; else returns absolute overlap area.
Returns 0 if no overlap.

layouts_cleanup(boxes, layouts, far=2, thr=0.7)

Cleans up layout detections by removing overlapping or redundant layout boxes.
Iterates over layout boxes within a far window and removes boxes with significant overlap (above thr threshold).
Decides which box to keep based on confidence score or total overlap with detected boxes.
Returns filtered list of layout boxes.

Input Creation and Preprocessing

create_inputs(self, imgs, im_info)

Converts a list of preprocessed images and their metadata into a dict of numpy arrays suitable as model inputs.
Handles batch input padding for variable image sizes by zero-padding smaller images.
Returns a dict with keys: 'image', 'im_shape', 'scale_factor'.

preprocess(self, image_list)

Performs preprocessing pipeline on raw images or image paths.
If model expects "scale_factor" input, applies a series of resizing, normalization, permuting, and padding operations using operators.
Otherwise, resizes images to model input shape, scales pixel values to [0, 1], and formats input tensor.
Returns a list of dicts, each dict containing inputs for one image.

Postprocessing

postprocess(self, boxes, inputs, thr)

Processes raw model outputs (bounding boxes and class scores) into structured detection results.
Filters out detections below confidence threshold thr.
Converts box format from [x, y, w, h] to [x1, y1, x2, y2] if needed.
Applies Non-Maximum Suppression (NMS) with IoU threshold 0.2 to reduce overlapping detections.
Returns a list of dicts with keys "type" (label), "bbox" (coordinates), and "score" (confidence).

Overlap and Matching Utilities

find_overlapped(box, boxes_sorted_by_y, naive=False)
Finds the index of the box in boxes_sorted_by_y that overlaps the most with box. Uses binary search for efficiency.
find_horizontally_tightest_fit(box, boxes)
Finds the index of the box horizontally closest to box within the same layout group.
find_overlapped_with_threshold(box, boxes, thr=0.3)
Finds the box with overlap above a threshold.

Inference and Lifecycle Methods

call(self, image_list, thr=0.7, batch_size=16)

Makes the Recognizer instance callable. Runs inference on a list of images.
Splits images into batches of size batch_size.
Preprocesses images, runs the ONNX model, and postprocesses outputs.
Returns list of detection results for all images.

close(self)

Cleans up the ONNX session and triggers garbage collection.
Should be called explicitly or via destructor to free resources.

del(self)

Destructor calls close() to ensure clean resource deallocation.

Important Implementation Details

Uses ONNX Runtime session for efficient model inference.
Supports dynamic batching with padding to handle images of different sizes.
Implements custom sorting and cleanup algorithms to refine detected layouts.
Uses classical NMS algorithm for final bounding box filtering based on IoU.
Modular preprocessing pipeline constructed from operator classes for flexibility.
Overlap computations carefully handle bounding box coordinate assertions.

Interaction with Other Parts of the System

Loads model via load_model from .ocr module.
Uses preprocessing operators from .operators module.
Uses project utilities like get_project_base_directory for model path resolution.
Returns detection results that can be consumed by downstream modules for layout analysis, OCR, or document understanding pipelines.

Usage Example

from recognizer import Recognizer

labels = ["text", "title", "table", "figure"]
recognizer = Recognizer(labels, task_name="layout_detection")

images = [cv2.imread("doc1.png"), cv2.imread("doc2.png")]
results = recognizer(images, thr=0.6, batch_size=2)

for i, res in enumerate(results):
    print(f"Image {i} detections:")
    for det in res:
        print(det)

recognizer.close()

Mermaid Class Diagram

classDiagram
    class Recognizer {
        -ort_sess
        -run_options
        -input_names
        -output_names
        -input_shape
        -label_list
        +__init__(label_list, task_name, model_dir=None)
        +__call__(image_list, thr=0.7, batch_size=16)
        +close()
        +__del__()
        +create_inputs(imgs, im_info)
        +preprocess(image_list)
        +postprocess(boxes, inputs, thr)
        +sort_Y_firstly(arr, threshold)
        +sort_X_firstly(arr, threshold)
        +sort_C_firstly(arr, thr=0)
        +sort_R_firstly(arr, thr=0)
        +overlapped_area(a, b, ratio=True)
        +layouts_cleanup(boxes, layouts, far=2, thr=0.7)
        +find_overlapped(box, boxes_sorted_by_y, naive=False)
        +find_horizontally_tightest_fit(box, boxes)
        +find_overlapped_with_threshold(box, boxes, thr=0.3)
    }

Summary

The recognizer.py file provides a robust, modular Recognizer class that loads and runs ONNX models for document layout or object detection tasks, with comprehensive preprocessing, inference batching, and postprocessing capabilities. It includes sophisticated spatial sorting and filtering utilities to refine detection outputs, and integrates tightly with the InfiniFlow project's model and operator infrastructure. This file is a foundational component enabling automated recognition workflows in the system.