common.py
Overview
common.py is a utility module in the InfiniFlow project that provides a comprehensive set of functions to interact with the InfiniFlow RESTful API services. The file focuses primarily on managing datasets, documents, chunks, chat assistants, and sessions by performing CRUD (Create, Read, Update, Delete) operations and batch processes through HTTP requests.
This module abstracts the underlying HTTP communication and multipart file uploads/downloads, enabling other system components or clients to manage resources related to datasets and conversational AI components efficiently. It relies on external configuration constants such as HOST_ADDRESS and VERSION to construct API endpoints dynamically, ensuring adaptability across different API versions or environments.
Detailed Descriptions
Constants
HEADERS: Default HTTP headers specifying JSON content type.DATASETS_API_URL: API path to datasets endpoint.FILE_API_URL: API path to dataset documents endpoint.FILE_CHUNK_API_URL: API path to dataset chunks endpoint.CHUNK_API_URL: API path to document chunks endpoint.CHAT_ASSISTANT_API_URL: API path to chat assistants endpoint.SESSION_WITH_CHAT_ASSISTANT_API_URL: API path to sessions under chat assistants.SESSION_WITH_AGENT_API_URL: API path to sessions under agents.
Dataset Management Functions
These functions manage datasets via the InfiniFlow API.
create_dataset(auth, payload=None, *, headers=HEADERS, data=None)
Purpose: Create a new dataset.
Parameters:
auth: Authentication tuple or object for API access.payload(dict, optional): JSON body with dataset details (e.g., name).headers(dict, optional): HTTP headers; defaults to JSON content-type.data(optional): Alternative data payload.
Returns: Parsed JSON response from the API.
Usage:
res = create_dataset(auth, {"name": "my_dataset"}) dataset_id = res["data"]["id"]
list_datasets(auth, params=None, *, headers=HEADERS)
List all datasets or filter by query parameters.
update_dataset(auth, dataset_id, payload=None, *, headers=HEADERS, data=None)
Update dataset metadata by dataset ID.
delete_datasets(auth, payload=None, *, headers=HEADERS, data=None)
Delete datasets as specified in payload.
batch_create_datasets(auth, num)
Create multiple datasets with autogenerated names.
Returns list of dataset IDs.
File Management Within Dataset
Functions to upload, download, list, update, delete, and parse documents within datasets.
upload_documents(auth, dataset_id, files_path=None)
Upload one or multiple files to a dataset.
Uses multipart form encoding for file upload.
Parameters:
files_path(list of str or Path): Paths to files to upload.
Returns: API response JSON containing uploaded document info.
Notes: Opens files in binary mode and ensures closure after upload.
download_document(auth, dataset_id, document_id, save_path)
Download a document from a dataset and save it locally.
Streams response to avoid loading large files into memory.
list_documents(auth, dataset_id, params=None)
List documents in a dataset with optional filtering.
update_document(auth, dataset_id, document_id, payload=None)
Update document metadata.
delete_documents(auth, dataset_id, payload=None)
Delete documents specified in the payload.
parse_documents(auth, dataset_id, payload=None)
Trigger parsing (e.g., text extraction or chunking) of documents.
stop_parse_documents(auth, dataset_id, payload=None)
Stop ongoing document parsing.
bulk_upload_documents(auth, dataset_id, num, tmp_path)
Helper function to generate and upload multiple test text documents.
Uses
create_txt_fileutility to create temporary text files.Returns list of uploaded document IDs.
Chunk Management Within Dataset
Chunks represent segmented parts of documents, useful for fine-grained retrieval or analysis.
add_chunk(auth, dataset_id, document_id, payload=None)
Add a chunk (piece of content) to a specific document.
list_chunks(auth, dataset_id, document_id, params=None)
List chunks of a document.
update_chunk(auth, dataset_id, document_id, chunk_id, payload=None)
Update chunk content or metadata.
delete_chunks(auth, dataset_id, document_id, payload=None)
Delete chunks as specified.
retrieval_chunks(auth, payload=None)
Perform retrieval queries on chunks (likely a search or similarity lookup).
batch_add_chunks(auth, dataset_id, document_id, num)
Batch add multiple chunks with generated content.
Returns list of created chunk IDs.
Chat Assistant Management
Functions for managing chat assistants, which presumably represent conversational agents.
create_chat_assistant(auth, payload=None)
Create a new chat assistant.
list_chat_assistants(auth, params=None)
List existing chat assistants.
update_chat_assistant(auth, chat_assistant_id, payload=None)
Update chat assistant details.
delete_chat_assistants(auth, payload=None)
Delete chat assistants specified in payload.
batch_create_chat_assistants(auth, num)
Create multiple chat assistants with autogenerated names.
Returns list of chat assistant IDs.
Session Management
Sessions represent interaction histories or contexts under chat assistants or agents.
create_session_with_chat_assistant(auth, chat_assistant_id, payload=None)
Create a session under a specified chat assistant.
list_session_with_chat_assistants(auth, chat_assistant_id, params=None)
List sessions for a chat assistant.
update_session_with_chat_assistant(auth, chat_assistant_id, session_id, payload=None)
Update session details.
delete_session_with_chat_assistants(auth, chat_assistant_id, payload=None)
Delete sessions under a chat assistant.
batch_add_sessions_with_chat_assistant(auth, chat_assistant_id, num)
Batch create sessions with autogenerated names.
Returns list of session IDs.
Important Implementation Details
HTTP Requests: Uses the
requestslibrary to communicate with the backend API.Multipart Uploads: Uses
requests_toolbelt.MultipartEncoderto upload multiple files in one HTTP request with correctContent-Type.File Handling: Files opened in binary mode are securely closed in
finallyblocks to prevent resource leaks.Dynamic URL Construction: API URLs are dynamically constructed using formatted strings, incorporating dataset IDs, document IDs, etc.
Batch Operations: Several batch helper functions automate the creation of multiple resources by looping and invoking single resource creation functions.
Streaming Downloads: Document downloads use streaming to handle large files efficiently.
Authentication: Functions expect an
authparameter compatible withrequestsauthentication mechanisms.
Interaction with Other System Components
Configurations: Imports
HOST_ADDRESSandVERSIONfromconfigsto build API URLs.Utility Functions: Uses
create_txt_filefromutils.file_utilsto generate temporary files for bulk uploads.REST API: Acts as a client wrapper around the InfiniFlow REST API, abstracting HTTP details from other components.
Data Layer: Facilitates data ingestion, retrieval, and management for datasets and chat assistants, likely consumed by higher-level modules such as UI layers or AI processing services.
Usage Example
from common import create_dataset, upload_documents, list_documents
auth = ('user', 'password')
# Create a new dataset
dataset_response = create_dataset(auth, {"name": "My Dataset"})
dataset_id = dataset_response['data']['id']
# Upload documents to the dataset
files = ['doc1.txt', 'doc2.txt']
upload_response = upload_documents(auth, dataset_id, files)
# List documents in the dataset
documents = list_documents(auth, dataset_id)
print(documents)
Mermaid Diagram - Function Flowchart
flowchart TD
A[Dataset Management]
A --> create_dataset
A --> list_datasets
A --> update_dataset
A --> delete_datasets
A --> batch_create_datasets
B[Document Management]
B --> upload_documents
B --> download_document
B --> list_documents
B --> update_document
B --> delete_documents
B --> parse_documents
B --> stop_parse_documents
B --> bulk_upload_documents
C[Chunk Management]
C --> add_chunk
C --> list_chunks
C --> update_chunk
C --> delete_chunks
C --> retrieval_chunks
C --> batch_add_chunks
D[Chat Assistant Management]
D --> create_chat_assistant
D --> list_chat_assistants
D --> update_chat_assistant
D --> delete_chat_assistants
D --> batch_create_chat_assistants
E[Session Management]
E --> create_session_with_chat_assistant
E --> list_session_with_chat_assistants
E --> update_session_with_chat_assistant
E --> delete_session_with_chat_assistants
E --> batch_add_sessions_with_chat_assistant
A --> B
B --> C
D --> E
Summary
The common.py file is a critical utility module that enables seamless, programmatic interaction with the InfiniFlow backend API. It encapsulates HTTP request logic for managing datasets, documents, document chunks, chat assistants, and sessions, providing batch operation helpers and safe file handling utilities. The module is designed to be reusable and extensible, serving as a foundational building block for higher-level workflows in the InfiniFlow system.