layout_recognizer.py


Overview

layout_recognizer.py defines classes and methods for detecting and classifying the structural layout components of document images. It extends a general Recognizer class to identify various document regions such as text blocks, titles, figures, tables, headers, footers, references, and equations within scanned or digital documents.

The primary functionality includes:

This module plays a critical role in document digitization pipelines, enabling downstream tasks such as OCR, semantic tagging, and content extraction by providing structured layout information.


Classes and Functions

Class: LayoutRecognizer

Inherits from deepdoc.vision.Recognizer.

Description

LayoutRecognizer encapsulates a deep learning model for document layout analysis. It predicts bounding boxes and categories of layout elements on pages and integrates OCR results to tag text blocks with layout types.

Attributes

Methods

__init__(self, domain)

Initializes the recognizer:

Parameters

__call__(self, image_list, ocr_res, scale_factor=3, thr=0.2, batch_size=16, drop=True) -> (list, list)

Main inference method to predict and tag layout regions on a list of images.

Parameters

Returns

Usage Example

from PIL import Image
layout_recognizer = LayoutRecognizer(domain="layout")

images = [Image.open("page1.png"), Image.open("page2.png")]
ocr_results = [...]  # OCR outputs for these pages

tagged_ocr, layouts = layout_recognizer(images, ocr_results)

Implementation Details

forward(self, image_list, thr=0.7, batch_size=16)

Simplified wrapper to call the parent class Recognizer's inference method directly.

Parameters

Returns


Class: LayoutRecognizer4YOLOv10

Inherits from LayoutRecognizer.

Description

Specialized layout recognizer using a YOLOv10 architecture variant. Implements custom preprocessing and postprocessing compatible with YOLOv10 input/output formats.

Attributes

Methods

__init__(self, domain)

Initializes the recognizer and sets YOLOv10-specific parameters.

preprocess(self, image_list) -> list

Prepares input images for YOLOv10 model inference.

Parameters

Returns

Process

postprocess(self, boxes, inputs, thr) -> list

Filters and refines raw YOLOv10 output bounding boxes.

Parameters

Returns

Process


Important Implementation Details and Algorithms


Interaction with Other System Components

This file typically fits into a document processing pipeline downstream of OCR and upstream of semantic analysis or document understanding modules.


Visual Diagram

classDiagram
    class LayoutRecognizer {
        +labels: list
        +garbage_layouts: list
        +client: DLAClient or None
        +__init__(domain)
        +__call__(image_list, ocr_res, scale_factor=3, thr=0.2, batch_size=16, drop=True)
        +forward(image_list, thr=0.7, batch_size=16)
    }

    class LayoutRecognizer4YOLOv10 {
        +labels: list
        +auto: bool
        +scaleFill: bool
        +scaleup: bool
        +stride: int
        +center: bool
        +__init__(domain)
        +preprocess(image_list)
        +postprocess(boxes, inputs, thr)
    }

    LayoutRecognizer4YOLOv10 --|> LayoutRecognizer

Summary

The layout_recognizer.py module provides a robust framework for detecting and classifying document layout regions using deep learning. It supports flexible model loading, accelerated inference, and integrates OCR results to produce enriched and cleaned layout annotations. The specialized YOLOv10 subclass demonstrates extensibility for different model architectures with tailored data processing pipelines. This module is central to enabling structured understanding of complex document images within the InfiniFlow system.