t_ocr.py


Overview

t_ocr.py is a utility script designed to perform Optical Character Recognition (OCR) on a collection of input images or PDF files. It leverages the OCR module from the deepdoc.vision package to detect and extract text regions and their content from images. The script supports parallel execution on multiple CUDA-enabled GPUs using asynchronous concurrency with the trio library. Results are output as annotated images with bounding boxes drawn around detected text, as well as corresponding plain text files containing the recognized text.

This tool is primarily intended for batch OCR processing workflows where users provide a directory or individual files as input, and receive both visual and textual OCR outputs in a specified directory.


Detailed Explanation

Imports and Environment Setup


Main Components

Function: main(args)

Primary function that orchestrates the OCR workflow.

Parameters
Workflow
  1. CUDA Device Detection
    Uses torch.cuda.device_count() to determine the number of available GPUs.

  2. Capacity Limiters for Concurrency
    If multiple GPUs are available, creates a list of trio.CapacityLimiter instances, one per device, to restrict concurrency and avoid device contention.

  3. OCR Initialization
    Instantiates the OCR engine via ocr = OCR().

  4. Input/Output Initialization
    Calls init_in_out(args) which returns:

    • images: a list of loaded images.

    • outputs: corresponding output file paths.

  5. OCR Task Definition (__ocr)
    A synchronous function that:

    • Receives task index i, device id id, and image img.

    • Converts the image to a NumPy array.

    • Runs OCR to detect text lines and bounding boxes.

    • Formats the OCR output into bounding boxes with coordinates and text.

    • Draws bounding boxes on the original image.

    • Saves the annotated image and a .txt file with extracted text.

  6. Asynchronous OCR Wrapper (__ocr_thread)
    An async wrapper that:

    • If concurrency limiter is set (multiple GPUs), uses async with limiter to control access.

    • Runs the synchronous __ocr function in a thread to avoid blocking the event loop.

  7. OCR Launcher (__ocr_launcher)

    • If multiple GPUs are detected, runs concurrent OCR tasks across devices using a nursery.

    • If single GPU or CPU, runs OCR tasks sequentially.

    • Uses await trio.sleep(0.1) to stagger task starts slightly.

  8. Execution
    Calls trio.run(__ocr_launcher) to run the async event loop and start OCR tasks.

  9. Prints completion message after all OCR tasks finish.

Returns

None. Outputs are saved to disk.

Usage Example
python t_ocr.py --inputs ./input_images --output_dir ./ocr_results

Command-Line Interface

The script uses argparse to define two arguments:


Important Implementation Details and Algorithms


Interactions with Other System Components

This file acts as a CLI tool that integrates these components to perform batched OCR with GPU acceleration and asynchronous concurrency.


Visual Diagram

flowchart TD
    A[Start: Parse CLI Arguments]
    B[Initialize OCR Engine]
    C[Load Inputs & Prepare Outputs]
    D{Detect CUDA Devices}
    E[Create CapacityLimiters (if multiple GPUs)]
    F[For Each Image]
    G{Multiple GPUs?}
    H[Run OCR Task with CapacityLimiter]
    I[Run OCR Task Sequentially]
    J[OCR Task (__ocr)]
    K[Draw Bounding Boxes]
    L[Save Annotated Image & Text]
    M[All Tasks Completed]

    A --> B --> C --> D --> E
    E --> F
    F --> G
    G -- Yes --> H --> J --> K --> L
    G -- No --> I --> J --> K --> L
    L --> F
    F --> M

Summary

t_ocr.py is a GPU-accelerated, asynchronous batch OCR processing script that loads images/PDFs, applies OCR to extract text and bounding boxes, annotates images, and saves both visual and textual results. It intelligently utilizes available GPUs with controlled concurrency to maximize throughput while avoiding device conflicts. The script is designed to be run from the command line with simple arguments specifying input and output locations. It relies on core OCR and vision utilities from the deepdoc package and is a useful component in document digitization or analysis pipelines.


End of Documentation for t_ocr.py