common.py
Overview
The common.py file serves as a utility module within the InfiniFlow project, primarily facilitating batch operations related to dataset, document, chunk, chat assistant, and session management through the ragflow_sdk API. It provides high-level helper functions that enable streamlined creation and bulk handling of core entities—such as datasets, documents, chunks, chats, and sessions—thus simplifying common workflows for testing, data ingestion, or initialization tasks.
This module abstracts repetitive operations by wrapping lower-level SDK calls into convenient batch functions, promoting code reuse and improving maintainability of scripts or services that interact with the InfiniFlow knowledge graph and conversational AI components.
Detailed Documentation of Functions
1. batch_create_datasets
def batch_create_datasets(client: RAGFlow, num: int) -> list[DataSet]:
Purpose: Creates multiple datasets in batch using the provided
RAGFlowclient.Parameters:
client(RAGFlow): An instance of the RAGFlow client used to interact with the backend service.num(int): The number of datasets to create.
Returns:
list[DataSet]— a list of newly createdDataSetobjects.Usage Example:
client = RAGFlow(...)
datasets = batch_create_datasets(client, 5)
for ds in datasets:
print(ds.name)
Implementation Details:
Uses a list comprehension to create datasets with names formatted as"dataset_i"whereiranges from0tonum-1.
2. bulk_upload_documents
def bulk_upload_documents(dataset: DataSet, num: int, tmp_path: Path) -> list[Document]:
Purpose: Uploads multiple text documents to a specified dataset in bulk.
Parameters:
dataset(DataSet): The target dataset where documents will be uploaded.num(int): The number of documents to create and upload.tmp_path(Path): Filesystem path where temporary text files will be created before upload.
Returns:
list[Document]— a list of uploadedDocumentobjects.Usage Example:
from pathlib import Path
tmp_dir = Path("/tmp")
uploaded_docs = bulk_upload_documents(dataset, 3, tmp_dir)
for doc in uploaded_docs:
print(doc.display_name)
Implementation Details:
For each document:Creates a temporary text file named
"ragflow_test_upload_i.txt"intmp_pathusing the helpercreate_txt_file.Reads the file content as bytes (blob).
Collects document metadata (
display_name,blob) into a list.Uses the
DataSet.upload_documentsmethod to upload all documents in one batch.
3. batch_add_chunks
def batch_add_chunks(document: Document, num: int) -> list[Chunk]:
Purpose: Adds multiple chunks to a given document.
Parameters:
document(Document): The document to which chunks will be added.num(int): Number of chunks to add.
Returns:
list[Chunk]— a list of createdChunkobjects.Usage Example:
chunks = batch_add_chunks(document, 10)
for chunk in chunks:
print(chunk.content)
Implementation Details:
Uses a list comprehension to add chunks with content"chunk test i"whereiranges from0tonum-1.
4. batch_create_chat_assistants
def batch_create_chat_assistants(client: RAGFlow, num: int) -> list[Chat]:
Purpose: Creates multiple chat assistant entities in batch.
Parameters:
client(RAGFlow): The RAGFlow client instance.num(int): Number of chat assistants to create.
Returns:
list[Chat]— a list of createdChatobjects.Usage Example:
chat_assistants = batch_create_chat_assistants(client, 4)
for chat in chat_assistants:
print(chat.name)
Implementation Details:
Uses a list comprehension to create chats named"test_chat_assistant_i".
5. batch_add_sessions_with_chat_assistant
def batch_add_sessions_with_chat_assistant(chat_assistant: Chat, num) -> list[Session]:
Purpose: Adds multiple sessions linked to a specific chat assistant.
Parameters:
chat_assistant(Chat): The chat assistant instance under which sessions will be created.num(int): Number of sessions to create.
Returns:
list[Session]— a list of createdSessionobjects.Usage Example:
sessions = batch_add_sessions_with_chat_assistant(chat_assistant, 5)
for session in sessions:
print(session.name)
Implementation Details:
Uses a list comprehension to create sessions named"session_with_chat_assistant_i".
Important Implementation Details and Algorithms
Batch Processing via List Comprehensions: All batch functions utilize Python list comprehensions to perform iterative creation or upload operations efficiently and succinctly.
Temporary File Creation: The
bulk_upload_documentsfunction depends on an external utilitycreate_txt_file(fromutils.file_utils) to generate temporary text files before uploading their contents as documents.Data Encapsulation: Document upload requires packaging file contents into a dictionary containing
display_nameand rawblobdata, which the dataset API consumes.
Interactions with Other System Components
ragflow_sdkIntegration: This file directly interacts with theragflow_sdklibrary, leveraging its core classes likeRAGFlow,DataSet,Document,Chunk,Chat, andSessionto perform CRUD operations on knowledge graph and conversational AI entities.File Utilities: It depends on
utils.file_utils.create_txt_filefor generating temporary text files necessary for document upload workflows.Higher-level Use Cases:
This module facilitates test data preparation, bulk initialization, or automated data seeding for the InfiniFlow platform.
Can be used within scripts or services that require rapid provisioning of datasets, documents, chats, and sessions for experimentation or deployment.
Visual Diagram: Function Flowchart
flowchart TD
A[RAGFlow Client] -->|batch_create_datasets| B[DataSet List]
B -->|bulk_upload_documents| C[Document List]
C -->|batch_add_chunks| D[Chunk List]
A -->|batch_create_chat_assistants| E[Chat List]
E -->|batch_add_sessions_with_chat_assistant| F[Session List]
subgraph File Utilities
G[create_txt_file]
end
G -->|creates text files| H[Temporary Files]
H -->|read as blobs| C
Explanation:
The flowchart illustrates how the main batch functions interact.
The
RAGFlowclient is the entry point for creating datasets and chat assistants.Documents are uploaded to datasets after temporary files are created by
create_txt_file.Chunks are added to documents.
Sessions are created under chat assistants.
File utilities support document upload by generating temporary files.
Summary
The common.py file provides essential batch utility functions for managing datasets, documents, chunks, chats, and sessions within the InfiniFlow ecosystem via the ragflow_sdk. It simplifies bulk operations through straightforward APIs and temporary file handling, supporting efficient data onboarding and testing workflows. This module plays a key role in accelerating development and deployment processes by abstracting common repetitive tasks into reusable functions.