init.py


Overview

This init.py file serves as the main initialization module for a document analysis package within the InfiniFlow project. It facilitates importing core recognition components, managing input and output preparation workflows (especially for image and PDF files), and controlling concurrent access to PDF processing resources.

Key functionalities include:

This file acts as a bridge connecting lower-level recognition modules with higher-level application workflows, enabling seamless document processing pipelines.


Classes and Functions

Imported Classes (Re-exported)

The following classes are imported from submodules and exposed as part of the package API:

Class Name

Source Module

Description

OCR

.ocr

Performs Optical Character Recognition (OCR) on document images.

Recognizer

.recognizer

General recognizer class for identifying document elements or content.

LayoutRecognizer

.layout_recognizer

Specialized layout analysis using YOLOv10-based model for detecting document structures.

TableStructureRecognizer

.table_structure_recognizer

Recognizes table structures within document images.


Global Variables

LOCK_KEY_pdfplumber

sys.modules[LOCK_KEY_pdfplumber]


Function: init_in_out(args)

init_in_out(args) -> (List[PIL.Image.Image], List[str])

Description

Prepares input images and corresponding output file paths based on the provided arguments. Supports both image files and multi-page PDFs by converting PDF pages into images. Ensures that output directories exist and manages thread-safe PDF processing.

Parameters

Returns

Usage Example

class Args:
    inputs = "/path/to/input_folder_or_file"
    output_dir = "/path/to/output_folder"

args = Args()
images, outputs = init_in_out(args)

for img, out_path in zip(images, outputs):
    # Process image and save results to out_path
    process(img)
    save_results(out_path)

Implementation Details


Important Implementation Notes


Interaction with Other Modules


Package Public API

The following names are exported as part of the package's __all__ list, indicating the public interface of this module:

__all__ = [
    "OCR",
    "Recognizer",
    "LayoutRecognizer",
    "TableStructureRecognizer",
    "init_in_out",
]

Mermaid Diagram: Flowchart of init_in_out Workflow

flowchart TD
    A[Start: Receive args.inputs and args.output_dir] --> B{Is args.inputs a directory?}
    
    B -- Yes --> C[Traverse files recursively using traversal_files]
    C --> D[For each file: images_and_outputs(file)]
    
    B -- No --> D[images_and_outputs(args.inputs)]
    
    subgraph images_and_outputs(fnm)
        direction LR
        E{Is file a PDF?} -->|Yes| F[pdf_pages(fnm)]
        E -->|No| G[Open and convert image file]
        
        subgraph pdf_pages(fnm)
            F1[Acquire LOCK_KEY_pdfplumber Lock]
            F2[Open PDF with pdfplumber]
            F3[Convert each page to image with zoom factor]
            F4[Append page images and output paths]
            F5[Close PDF and release lock]
            F1 --> F2 --> F3 --> F4 --> F5
        end
    end
    
    D --> H[Ensure output_dir exists or create it]
    H --> I[Return images list and outputs list]

Summary

This init.py initializes the document recognition package by exposing key recognizer classes and providing the init_in_out function. This function is crucial for preprocessing inputs—loading images and PDFs safely and preparing outputs—serving as the entry point into the document analysis pipeline. The use of a global lock ensures thread-safe PDF processing, a critical implementation detail for concurrent environments.

This module interacts closely with internal recognition submodules and external utilities, forming a foundational layer in the system's document processing architecture.