file_service.py


Overview

file_service.py is a core service module in the InfiniFlow system responsible for managing file-related operations within a multi-tenant knowledge management platform. It provides comprehensive functionality to handle files and folders stored in tenants' knowledge bases, supporting hierarchical folder structures, file metadata management, document uploads, parsing, and integration with storage backends.

The FileService class encapsulates operations such as retrieving files by folder, creating folders recursively, managing knowledge base folders, uploading and parsing documents, moving files, and interacting with linked documents. It leverages Peewee ORM for database interactions, integrates with external parsers, and uses a pluggable storage implementation for file blobs.


Classes and Methods

Class: FileService(CommonService)

Service class managing file entities and associated operations.


Methods

get_by_pf_id(tenant_id, pf_id, page_number, items_per_page, orderby, desc, keywords)

Retrieve a paginated list of files under a specific parent folder (identified by pf_id) for a tenant, optionally filtered by search keywords.


get_kb_id_by_file_id(file_id)

Fetches knowledge base IDs and names linked to a given file ID.


get_by_pf_id_name(id, name)

Retrieve a file record by parent folder ID and file name.


get_id_list_by_id(id, name, count, res)

Recursively traverses folder names to return a list of file IDs.


get_all_innermost_file_ids(folder_id, result_ids)

Retrieve IDs of all files in the deepest nested folders starting from folder_id.


create_folder(file, parent_id, name, count)

Recursively creates a folder hierarchy as per the given list of folder names.


is_parent_folder_exist(parent_id)

Checks whether a folder with the given ID exists.


get_root_folder(tenant_id)

Retrieves or creates the root folder for a given tenant.


get_kb_folder(tenant_id)

Fetches the knowledge base folder under the tenant's root folder, creating it if necessary.


new_a_file_from_kb(tenant_id, name, parent_id, ty=FileType.FOLDER.value, size=0, location="")

Creates a new file record linked to a knowledge base folder.


init_knowledgebase_docs(root_id, tenant_id)

Initializes knowledge base documents under the root folder for a tenant.


get_parent_folder(file_id)

Retrieves the parent folder of a given file.


get_all_parent_folders(start_id)

Retrieves all parent folders in the hierarchy for a given file ID.


insert(file)

Inserts a new file record into the database.


delete(file)

Deletes a file record by its object.


delete_by_pf_id(folder_id)

Deletes files where parent_id equals folder_id.


delete_folder_by_pf_id(user_id, folder_id)

Recursively deletes folder and all subfolders/files for a user.


get_file_count(tenant_id)

Returns the total number of files for a tenant.


get_folder_size(folder_id)

Calculates the cumulative size of a folder (including nested files/folders).


add_file_from_kb(doc, kb_folder_id, tenant_id)

Adds a file entry linked to a knowledge base document.


move_file(file_ids, folder_id)

Moves multiple files to a new parent folder.


upload_document(kb, file_objs, user_id)

Uploads multiple document files to a knowledge base, processes and stores them.


parse_docs(file_objs, user_id)

Parses multiple document files concurrently.


parse(filename, blob, img_base64=True, tenant_id=None)

Parses a single document blob into text or base64 image.


get_parser(doc_type, filename, default)

Determines parser type based on document type and file extension.


get_blob(user_id, location)

Retrieves binary blob from storage for user downloads.


put_blob(user_id, location, blob)

Stores binary blob in user downloads storage.


Important Implementation Details and Algorithms


Interaction with Other System Components


Usage Example - Uploading Documents

from api.db.models import Knowledgebase
from api.db.services.file_service import FileService

# Assume kb is a Knowledgebase object, files is a list of uploaded FileStorage objects, and user_id is current user ID
errors, uploaded_files = FileService.upload_document(kb, files, user_id)

if errors:
    print("Some files failed to upload:", errors)
else:
    print("All files uploaded successfully.")

Mermaid Class Diagram

classDiagram
    class FileService {
        <<CommonService>>
        +model: File
        +get_by_pf_id(tenant_id, pf_id, page_number, items_per_page, orderby, desc, keywords)
        +get_kb_id_by_file_id(file_id)
        +get_by_pf_id_name(id, name)
        +get_id_list_by_id(id, name, count, res)
        +get_all_innermost_file_ids(folder_id, result_ids)
        +create_folder(file, parent_id, name, count)
        +is_parent_folder_exist(parent_id)
        +get_root_folder(tenant_id)
        +get_kb_folder(tenant_id)
        +new_a_file_from_kb(tenant_id, name, parent_id, ty, size, location)
        +init_knowledgebase_docs(root_id, tenant_id)
        +get_parent_folder(file_id)
        +get_all_parent_folders(start_id)
        +insert(file)
        +delete(file)
        +delete_by_pf_id(folder_id)
        +delete_folder_by_pf_id(user_id, folder_id)
        +get_file_count(tenant_id)
        +get_folder_size(folder_id)
        +add_file_from_kb(doc, kb_folder_id, tenant_id)
        +move_file(file_ids, folder_id)
        +upload_document(kb, file_objs, user_id)
        +parse_docs(file_objs, user_id)
        +parse(filename, blob, img_base64, tenant_id)
        +get_parser(doc_type, filename, default)
        +get_blob(user_id, location)
        +put_blob(user_id, location, blob)
    }

Summary

file_service.py implements a robust file management service that supports multi-tenant hierarchical file storage integrated with knowledge bases in InfiniFlow. It provides CRUD operations for files and folders, document upload and parsing capabilities, and connects tightly with other database services and storage backends. The design emphasizes recursive folder handling, extensible parsing strategies, and seamless interaction with knowledge base documents.

This file plays a critical role in enabling users to organize, upload, parse, and manage knowledge documents efficiently within their tenant spaces, serving as a backbone for file-related workflows in the platform.