file2document_app.py
Overview
The file2document_app.py file provides RESTful API endpoints for managing the conversion of files into documents within the InfiniFlow system. It handles associating files (or folders containing files) with documents in one or more knowledgebases, as well as removing these associations. This process involves querying file metadata, knowledgebase configurations, and document records, then performing create and delete operations on documents and their mappings to files.
This file is a part of the backend API layer, built with Flask and Flask-Login for authentication, and interacts extensively with service-layer modules dealing with files, documents, knowledgebases, and their relationships.
Classes and Functions
This file does not define any classes but contains two main Flask route handler functions:
1. convert()
Purpose
Convert one or more files (or folders) into documents linked to specified knowledgebases. This includes:
Handling folders by recursively fetching all inner files.
Removing existing document mappings and documents related to the files.
Creating new documents for each file-knowledgebase pair.
Returning a JSON response with the new file-document mappings.
Decorators
@manager.route('/convert', methods=['POST']): Exposes the function at the/convertpath as a POST endpoint.@login_required: Requires the user to be logged in.@validate_request("file_ids", "kb_ids"): Validates that the POST request JSON containsfile_idsandkb_ids.
Parameters
None (uses
request.jsonto extract input).
Expected JSON body keys:
file_ids(list of strings): IDs of files or folders to be converted.kb_ids(list of strings): Knowledgebase IDs to associate the documents with.
Returns
JSON response with status and data:
On success: List of created file-to-document mappings (
file2documents).On failure: Error messages indicating missing files, documents, tenants, or database errors.
Usage Example
POST /convert
Content-Type: application/json
Authorization: Bearer <token>
{
"file_ids": ["file123", "folder456"],
"kb_ids": ["kb789", "kb101"]
}
Implementation Details
Retrieves files by IDs.
For folders, recursively finds all innermost files.
For each file:
Deletes existing documents and mappings.
Creates new documents for each knowledgebase:
Generates a unique UUID for the document.
Determines the parser to use based on file type and knowledgebase parser config.
Inserts document and mapping records.
Returns the new mappings as JSON.
Error handling is performed at multiple steps, returning appropriate error results if any entity cannot be found or operations fail.
2. rm()
Purpose
Remove documents and their associations with files for given file IDs.
Decorators
@manager.route('/rm', methods=['POST']): Exposes the function at the/rmpath as a POST endpoint.@login_required: Requires logged-in user.@validate_request("file_ids"): Validates that the POST request JSON containsfile_ids.
Parameters
None (uses
request.json).
Expected JSON body key:
file_ids(list of strings): IDs of files whose document associations are to be removed.
Returns
JSON response indicating success (
true) or failure with appropriate messages.
Usage Example
POST /rm
Content-Type: application/json
Authorization: Bearer <token>
{
"file_ids": ["file123", "file456"]
}
Implementation Details
For each file ID:
Retrieves associated file-to-document mappings.
Deletes all mappings for the file.
For each related document:
Retrieves tenant info.
Removes the document from the database.
Returns success or error results accordingly.
Important Implementation Details
File Type Handling: If the input file is a folder, the system uses
FileService.get_all_innermost_file_idsto retrieve all files within it to process individually.Document Removal: Before creating new document entries, existing documents linked to the files are deleted to avoid duplicates or stale data.
Parser Resolution: The parser for the document is determined dynamically based on the file type, file name, and knowledgebase parser ID.
Error Handling: The code uses multiple layers of error checks, returning JSON error responses instead of raising exceptions directly.
UUID Generation: Each new document and file-to-document mapping is assigned a unique UUID using
get_uuid().
Interactions with Other Components
FileService: Retrieves file metadata, including folder contents and parser information.
File2DocumentService: Manages mappings between files and documents (fetching, inserting, deleting).
DocumentService: Handles document creation, retrieval, and removal, including tenant validation.
KnowledgebaseService: Retrieves knowledgebase information used to associate documents.
API Utilities: Helper functions for consistent API response formatting and request validation.
Flask & Flask-Login: Provides routing, request handling, and session-based user authentication.
The endpoints exposed here are likely consumed by frontend components or other backend services that manage file ingestion, knowledgebase management, and document processing workflows in the InfiniFlow platform.
Mermaid Class Diagram
classDiagram
class file2document_app {
<<module>>
+convert()
+rm()
}
class FileService {
+get_by_ids(ids: list) File[]
+get_all_innermost_file_ids(folder_id: str, acc: list) list
+get_by_id(file_id: str) (bool, File)
+get_parser(type: str, name: str, kb_parser_id: str) str
}
class File2DocumentService {
+get_by_file_id(file_id: str) list
+insert(data: dict) File2Document
+delete_by_file_id(file_id: str)
}
class DocumentService {
+get_by_id(doc_id: str) (bool, Document)
+get_tenant_id(doc_id: str) str
+remove_document(doc: Document, tenant_id: str) bool
+insert(data: dict) Document
}
class KnowledgebaseService {
+get_by_id(kb_id: str) (bool, Knowledgebase)
}
file2document_app ..> FileService : uses
file2document_app ..> File2DocumentService : uses
file2document_app ..> DocumentService : uses
file2document_app ..> KnowledgebaseService : uses
Summary
The file2document_app.py file is a backend API module responsible for converting files into documents and managing their lifecycle within knowledgebases. It provides secure endpoints for creating and deleting file-document relationships, ensuring data consistency through careful validation and error handling. The module integrates tightly with service layers that abstract database operations, making it a crucial part of the InfiniFlow document ingestion and management pipeline.