recognizer.py
Overview
The recognizer.py file defines the Recognizer class, a core component designed for object detection and recognition tasks within the InfiniFlow project. This class loads and runs pre-trained models (likely OCR or document layout models) using ONNX Runtime and provides utilities for preprocessing input images, postprocessing model outputs, and sorting or filtering detected bounding boxes based on spatial or confidence criteria.
The module facilitates batch processing of images to detect objects or text regions, returns structured bounding box data with classification labels and confidence scores, and includes several static helper functions for spatial analysis and cleanup of detection results.
Detailed Description
Imports and Dependencies
gc,logging,os,math,numpy,cv2,functools.cmp_to_keyProject-specific utilities for file paths and model loading (
get_project_base_directory,load_model)Operators and preprocessing functions from the local package (
operators,preprocess)
Recognizer Class
The Recognizer class encapsulates model loading, inference, and postprocessing logic for object recognition.
Initialization
def __init__(self, label_list, task_name, model_dir=None)
Purpose: Initializes the recognizer by loading the ONNX model session for a specific task.
Parameters:
label_list(list[str]): List of class labels that the model predicts.task_name(str): Task identifier string used to load the correct model.model_dir(str, optional): Directory path where the model files are stored. Defaults to a project-relative path if not provided.
Behavior:
Loads the inference session withload_model, sets input/output node names, and records input image shape expected by the model.Usage example:
labels = ["text", "title", "figure", "table"]
recognizer = Recognizer(label_list=labels, task_name="layout_detection")
Static Sorting and Spatial Utility Methods
Several static methods provide sorting of detected boxes by spatial coordinates or attributes, and compute overlap between bounding boxes.
sort_Y_firstly(arr, threshold)
Sorts a list of dicts by their
"top"coordinate primarily, then by"x0"if vertical difference is less than threshold.Used to order boxes top-to-bottom, then left-to-right within similar vertical bands.
sort_X_firstly(arr, threshold)
Sorts a list of dicts by
"x0"(horizontal, left-to-right) primarily, then"top"if horizontal difference is less than threshold.Used to order boxes left-to-right, then top-to-bottom within similar horizontal bands.
sort_C_firstly(arr, thr=0)
Sorts boxes primarily by attribute
"C"(if present), then"top".Uses
sort_X_firstlyas initial step and then restores order based on"C".
sort_R_firstly(arr, thr=0)
Sorts boxes primarily by attribute
"R"(if present), then"x0".Uses
sort_Y_firstlyas initial step and then restores order based on"R".
overlapped_area(a, b, ratio=True)
Computes the overlapping area between two bounding boxes
aandbrepresented as dicts with keys"top","bottom","x0","x1".If
ratio=True, returns the overlap ratio relative to boxa's area; else returns absolute overlap area.Returns 0 if no overlap.
layouts_cleanup(boxes, layouts, far=2, thr=0.7)
Cleans up layout detections by removing overlapping or redundant layout boxes.
Iterates over layout boxes within a
farwindow and removes boxes with significant overlap (abovethrthreshold).Decides which box to keep based on confidence score or total overlap with detected boxes.
Returns filtered list of layout boxes.
Input Creation and Preprocessing
create_inputs(self, imgs, im_info)
Converts a list of preprocessed images and their metadata into a dict of numpy arrays suitable as model inputs.
Handles batch input padding for variable image sizes by zero-padding smaller images.
Returns a dict with keys:
'image','im_shape','scale_factor'.
preprocess(self, image_list)
Performs preprocessing pipeline on raw images or image paths.
If model expects
"scale_factor"input, applies a series of resizing, normalization, permuting, and padding operations usingoperators.Otherwise, resizes images to model input shape, scales pixel values to [0, 1], and formats input tensor.
Returns a list of dicts, each dict containing inputs for one image.
Postprocessing
postprocess(self, boxes, inputs, thr)
Processes raw model outputs (bounding boxes and class scores) into structured detection results.
Filters out detections below confidence threshold
thr.Converts box format from
[x, y, w, h]to[x1, y1, x2, y2]if needed.Applies Non-Maximum Suppression (NMS) with IoU threshold 0.2 to reduce overlapping detections.
Returns a list of dicts with keys
"type"(label),"bbox"(coordinates), and"score"(confidence).
Overlap and Matching Utilities
find_overlapped(box, boxes_sorted_by_y, naive=False)
Finds the index of the box inboxes_sorted_by_ythat overlaps the most withbox. Uses binary search for efficiency.find_horizontally_tightest_fit(box, boxes)
Finds the index of the box horizontally closest toboxwithin the same layout group.find_overlapped_with_threshold(box, boxes, thr=0.3)
Finds the box with overlap above a threshold.
Inference and Lifecycle Methods
call(self, image_list, thr=0.7, batch_size=16)
Makes the Recognizer instance callable. Runs inference on a list of images.
Splits images into batches of size
batch_size.Preprocesses images, runs the ONNX model, and postprocesses outputs.
Returns list of detection results for all images.
close(self)
Cleans up the ONNX session and triggers garbage collection.
Should be called explicitly or via destructor to free resources.
del(self)
Destructor calls
close()to ensure clean resource deallocation.
Important Implementation Details
Uses ONNX Runtime session for efficient model inference.
Supports dynamic batching with padding to handle images of different sizes.
Implements custom sorting and cleanup algorithms to refine detected layouts.
Uses classical NMS algorithm for final bounding box filtering based on IoU.
Modular preprocessing pipeline constructed from operator classes for flexibility.
Overlap computations carefully handle bounding box coordinate assertions.
Interaction with Other Parts of the System
Loads model via
load_modelfrom.ocrmodule.Uses preprocessing operators from
.operatorsmodule.Uses project utilities like
get_project_base_directoryfor model path resolution.Returns detection results that can be consumed by downstream modules for layout analysis, OCR, or document understanding pipelines.
Usage Example
from recognizer import Recognizer
labels = ["text", "title", "table", "figure"]
recognizer = Recognizer(labels, task_name="layout_detection")
images = [cv2.imread("doc1.png"), cv2.imread("doc2.png")]
results = recognizer(images, thr=0.6, batch_size=2)
for i, res in enumerate(results):
print(f"Image {i} detections:")
for det in res:
print(det)
recognizer.close()
Mermaid Class Diagram
classDiagram
class Recognizer {
-ort_sess
-run_options
-input_names
-output_names
-input_shape
-label_list
+__init__(label_list, task_name, model_dir=None)
+__call__(image_list, thr=0.7, batch_size=16)
+close()
+__del__()
+create_inputs(imgs, im_info)
+preprocess(image_list)
+postprocess(boxes, inputs, thr)
+sort_Y_firstly(arr, threshold)
+sort_X_firstly(arr, threshold)
+sort_C_firstly(arr, thr=0)
+sort_R_firstly(arr, thr=0)
+overlapped_area(a, b, ratio=True)
+layouts_cleanup(boxes, layouts, far=2, thr=0.7)
+find_overlapped(box, boxes_sorted_by_y, naive=False)
+find_horizontally_tightest_fit(box, boxes)
+find_overlapped_with_threshold(box, boxes, thr=0.3)
}
Summary
The recognizer.py file provides a robust, modular Recognizer class that loads and runs ONNX models for document layout or object detection tasks, with comprehensive preprocessing, inference batching, and postprocessing capabilities. It includes sophisticated spatial sorting and filtering utilities to refine detection outputs, and integrates tightly with the InfiniFlow project's model and operator infrastructure. This file is a foundational component enabling automated recognition workflows in the system.