document_app.py


Overview

document_app.py is a Flask-based web API module designed to manage document-related operations within the InfiniFlow platform. It serves as a controller layer, exposing RESTful endpoints that handle uploading, parsing, listing, modifying, and deleting documents associated with knowledgebases. This file heavily interacts with multiple backend services, including document, file, task, user, and knowledgebase services, to coordinate complex workflows like document ingestion, parsing, metadata management, task queuing, and access control.

The API routes defined in this file require user authentication (using flask_login) and perform rigorous input validation. The endpoints support both file uploads and web crawling, different document parsers, and document lifecycle management such as running parsing tasks, changing document status, renaming, and retrieving thumbnails or actual file content.

Key Features:


Detailed Description of Endpoints and Functions

1. upload()


2. web_crawl()


3. create()


4. list_docs()


5. get_filter()


6. docinfos()


7. thumbnails()


8. change_status()


9. rm()


10. run()


11. rename()


12. get(doc_id)


13. change_parser()


14. get_image(image_id)


15. upload_and_parse()


16. parse()


17. set_meta()


Important Implementation Details and Algorithms


Interaction with Other Parts of the System


Visual Diagram

classDiagram
    class DocumentApp {
        +upload()
        +web_crawl()
        +create()
        +list_docs()
        +get_filter()
        +docinfos()
        +thumbnails()
        +change_status()
        +rm()
        +run()
        +rename()
        +get(doc_id)
        +change_parser()
        +get_image(image_id)
        +upload_and_parse()
        +parse()
        +set_meta()
    }

    class DocumentService {
        +get_by_id()
        +query()
        +insert()
        +update_by_id()
        +remove_document()
        +get_by_kb_id()
        +accessible()
        +accessible4deletion()
        +get_filter_by_kb_id()
        +get_by_ids()
        +get_thumbnails()
        +clear_chunk_num_when_rerun()
        +increment_chunk_num()
        +update_parser_config()
        +count_by_kb_id()
        +get_tenant_id()
    }

    class FileService {
        +upload_document()
        +get_root_folder()
        +init_knowledgebase_docs()
        +get_kb_folder()
        +new_a_file_from_kb()
        +add_file_from_kb()
        +filter_delete()
        +get_by_id()
        +update_by_id()
        +parse_docs()
    }

    class TaskService {
        +cancel_all_task_of()
        +filter_delete()
        +queue_tasks()
    }

    class KnowledgebaseService {
        +get_by_id()
        +query()
        +delete_field_map()
    }

    class File2DocumentService {
        +get_storage_address()
        +get_by_document_id()
        +delete_by_document_id()
    }

    class UserTenantService {
        +query()
    }

    DocumentApp --> DocumentService : uses
    DocumentApp --> FileService : uses
    DocumentApp --> TaskService : uses
    DocumentApp --> KnowledgebaseService : uses
    DocumentApp --> File2DocumentService : uses
    DocumentApp --> UserTenantService : uses

Summary

document_app.py is a critical component for document lifecycle management in the InfiniFlow platform. It exposes APIs that cover document upload, creation, listing, parsing, metadata setting, and deletion with robust security and validation mechanisms. The module integrates multiple backend services and utilities to provide a seamless experience for managing knowledgebase documents, supporting complex workflows like web crawling and dynamic parsing. Its design abstracts storage and parsing details while providing comprehensive control over documents and their processing state.


End of document_app.py documentation.