common.py
Overview
common.py is a utility module that provides a comprehensive set of functions to interact with the InfiniFlow backend REST API. It focuses on managing key entities such as datasets, documents (files), chunks, chat assistants, and sessions. The module abstracts HTTP requests to the API endpoints, enabling easy creation, retrieval, update, deletion (CRUD) operations, and batch processing for these entities.
This file acts as a client-side connector for the InfiniFlow system, handling authentication, request formatting, multipart file uploads, and response parsing. It is designed to be used by higher-level application components or scripts that require programmatic access to InfiniFlow’s resources.
Detailed Functionality
The functions in common.py are grouped by the resource or domain they manage:
Dataset Management
File (Document) Management within Datasets
Chunk Management within Datasets
Chat Assistant Management
Session Management with Chat Assistants
All HTTP requests use the requests library and expect an auth parameter (for authentication, e.g., HTTPBasicAuth). The base URL is configurable via the HOST_ADDRESS environment variable.
Constants
Constant | Description |
|---|---|
| Default HTTP headers for JSON content. |
| Base URL for API requests (default |
API endpoint path for dataset operations. | |
API endpoint path for documents within datasets. | |
API endpoint for file chunk operations. | |
API endpoint for document chunk operations. | |
API endpoint path for chat assistant operations. | |
API endpoint for chat assistant sessions. | |
API endpoint for agent sessions (not explicitly used in this file). | |
Placeholder invalid token string. | |
Max length for dataset names (128 chars). | |
Max length for document names (128 chars). | |
Max length for chat assistant names (255 chars). | |
Max length for chat assistant session names (255 chars). |
Dataset Management Functions
create_dataset(auth, payload=None, *, headers=HEADERS, data=None)
Creates a new dataset.
Parameters:
auth: Authentication handler (e.g., HTTPBasicAuth).payload(dict, optional): JSON payload describing dataset properties (e.g.,{"name": "dataset1"}).headers(dict, optional): HTTP headers.data(optional): Alternative data payload.
Returns: Parsed JSON response from the API.
Usage:
res = create_dataset(auth, {"name": "my_dataset"}) print(res)
list_datasets(auth, params=None, *, headers=HEADERS)
Lists datasets, optionally filtered by parameters.
Parameters:
auth: Authentication handler.params(dict, optional): Query parameters to filter the dataset list.
Returns: Parsed JSON response containing datasets.
Usage:
datasets = list_datasets(auth, {"limit": 10})
update_dataset(auth, dataset_id, payload=None, *, headers=HEADERS, data=None)
Updates dataset metadata.
Parameters:
dataset_id(str): Identifier of the dataset.payload(dict, optional): Updated dataset fields.
Returns: Parsed JSON response.
Usage:
update_dataset(auth, "1234", {"name": "updated_name"})
delete_datasets(auth, payload=None, *, headers=HEADERS, data=None)
Deletes datasets in batch via payload.
Parameters:
payload(dict, optional): Typically includes dataset IDs to delete.
Returns: Parsed JSON response.
Usage:
delete_datasets(auth, {"ids": ["1234", "5678"]})
batch_create_datasets(auth, num)
Creates multiple datasets named sequentially.
Parameters:
num(int): Number of datasets to create.
Returns: List of created dataset IDs.
Usage:
ids = batch_create_datasets(auth, 5)
File (Document) Management Functions
upload_documnets(auth, dataset_id, files_path=None)
Uploads one or more documents/files to a dataset.
Parameters:
dataset_id(str): Target dataset identifier.files_path(list of str, optional): Paths to files for upload.
Returns: Parsed JSON response with uploaded document info.
Implementation Detail: Uses
requests_toolbelt.MultipartEncoderfor multipart file upload.Usage:
upload_documnets(auth, "dataset123", ["file1.txt", "file2.pdf"])
download_document(auth, dataset_id, document_id, save_path)
Downloads a document from a dataset and saves it locally.
Parameters:
dataset_id(str)document_id(str): Document to download.save_path(str or Path): Local path to save the file.
Returns: The
requests.Responseobject.Usage:
download_document(auth, "dataset123", "doc456", "/tmp/mydoc.txt")
list_documnets(auth, dataset_id, params=None)
Lists documents in a dataset.
Parameters:
dataset_id(str)params(dict, optional): Query parameters.
Returns: Parsed JSON list of documents.
Usage:
list_documnets(auth, "dataset123", {"limit": 10})
update_documnet(auth, dataset_id, document_id, payload=None)
Updates document metadata.
Parameters:
document_id(str)payload(dict): Updated document info.
Returns: Parsed JSON response.
Usage:
update_documnet(auth, "dataset123", "doc456", {"name": "new_name"})
delete_documnets(auth, dataset_id, payload=None)
Deletes one or more documents in a dataset.
Parameters:
payload(dict): Typically containing document IDs.
Returns: Parsed JSON response.
Usage:
delete_documnets(auth, "dataset123", {"ids": ["doc456", "doc789"]})
parse_documnets(auth, dataset_id, payload=None)
Triggers parsing of documents in a dataset to create chunks.
Parameters:
payload(dict): Parsing options or document IDs.
Returns: Parsed JSON response.
Usage:
parse_documnets(auth, "dataset123", {"document_ids": ["doc456"]})
stop_parse_documnets(auth, dataset_id, payload=None)
Stops ongoing document parsing.
Parameters:
payload(dict): Information to identify parsing job.
Returns: Parsed JSON response.
Usage:
stop_parse_documnets(auth, "dataset123", {"job_id": "xyz"})
bulk_upload_documents(auth, dataset_id, num, tmp_path)
Helper to create temporary files and bulk upload them.
Parameters:
num(int): Number of files to upload.tmp_path(Path): Directory to create temp files.
Returns: List of uploaded document IDs.
Usage:
bulk_upload_documents(auth, "dataset123", 10, Path("/tmp"))
Chunk Management Functions
add_chunk(auth, dataset_id, document_id, payload=None)
Adds a chunk to a specific document.
Parameters:
dataset_id(str)document_id(str)payload(dict): Chunk data, e.g.,{"content": "chunk text"}
Returns: Parsed JSON response.
Usage:
add_chunk(auth, "dataset123", "doc456", {"content": "chunk content"})
list_chunks(auth, dataset_id, document_id, params=None)
Lists chunks for a document.
Parameters:
params(dict, optional): Query filters.
Returns: Parsed JSON list of chunks.
Usage:
list_chunks(auth, "dataset123", "doc456")
update_chunk(auth, dataset_id, document_id, chunk_id, payload=None)
Updates a chunk’s data.
Parameters:
chunk_id(str)payload(dict): Updated chunk info.
Returns: Parsed JSON response.
Usage:
update_chunk(auth, "dataset123", "doc456", "chunk789", {"content": "updated content"})
delete_chunks(auth, dataset_id, document_id, payload=None)
Deletes chunks in batch.
Parameters:
payload(dict): Chunk IDs or criteria.
Returns: Parsed JSON response.
Usage:
delete_chunks(auth, "dataset123", "doc456", {"ids": ["chunk789"]})
retrieval_chunks(auth, payload=None)
Retrieves chunks based on retrieval query.
Parameters:
payload(dict): Retrieval parameters.
Returns: Parsed JSON with retrieval results.
Usage:
retrieval_chunks(auth, {"query": "search text"})
batch_add_chunks(auth, dataset_id, document_id, num)
Creates several chunks with test content.
Parameters:
num(int): Number of chunks to add.
Returns: List of created chunk IDs.
Usage:
batch_add_chunks(auth, "dataset123", "doc456", 5)
Chat Assistant Management Functions
create_chat_assistant(auth, payload=None)
Creates a new chat assistant.
Parameters:
payload(dict): Chat assistant properties, e.g.,{"name": "assistant1", "dataset_ids": []}
Returns: Parsed JSON response.
Usage:
create_chat_assistant(auth, {"name": "assistant1", "dataset_ids": []})
list_chat_assistants(auth, params=None)
Lists chat assistants.
Parameters:
params(dict, optional): Filters.
Returns: Parsed JSON response.
Usage:
list_chat_assistants(auth)
update_chat_assistant(auth, chat_assistant_id, payload=None)
Updates chat assistant info.
Parameters:
chat_assistant_id(str)payload(dict)
Returns: Parsed JSON response.
Usage:
update_chat_assistant(auth, "assistant123", {"name": "new_name"})
delete_chat_assistants(auth, payload=None)
Deletes chat assistants.
Parameters:
payload(dict): Assistant IDs.
Returns: Parsed JSON response.
Usage:
delete_chat_assistants(auth, {"ids": ["assistant123"]})
batch_create_chat_assistants(auth, num)
Creates multiple chat assistants.
Parameters:
num(int)
Returns: List of created chat assistant IDs.
Usage:
batch_create_chat_assistants(auth, 3)
Session Management with Chat Assistants
create_session_with_chat_assistant(auth, chat_assistant_id, payload=None)
Creates a session linked to a chat assistant.
Parameters:
chat_assistant_id(str)payload(dict): Session details like{"name": "session1"}
Returns: Parsed JSON response.
Usage:
create_session_with_chat_assistant(auth, "assistant123", {"name": "session1"})
list_session_with_chat_assistants(auth, chat_assistant_id, params=None)
Lists sessions for a chat assistant.
Parameters:
chat_assistant_id(str)params(dict, optional)
Returns: Parsed JSON list of sessions.
Usage:
list_session_with_chat_assistants(auth, "assistant123")
update_session_with_chat_assistant(auth, chat_assistant_id, session_id, payload=None)
Updates session info.
Parameters:
session_id(str)payload(dict)
Returns: Parsed JSON.
Usage:
update_session_with_chat_assistant(auth, "assistant123", "session456", {"name": "updated"})
delete_session_with_chat_assistants(auth, chat_assistant_id, payload=None)
Deletes sessions linked to a chat assistant.
Parameters:
payload(dict): Session IDs.
Returns: Parsed JSON.
Usage:
delete_session_with_chat_assistants(auth, "assistant123", {"ids": ["session456"]})
batch_add_sessions_with_chat_assistant(auth, chat_assistant_id, num)
Creates multiple sessions for a chat assistant.
Parameters:
num(int)
Returns: List of created session IDs.
Usage:
batch_add_sessions_with_chat_assistant(auth, "assistant123", 5)
Important Implementation Details
HTTP Requests: The module uses the
requestslibrary for all HTTP calls. Authentication is passed as anauthparameter, which should be compatible withrequests(e.g., BasicAuth or Bearer tokens).Multipart File Uploads: Uses
requests_toolbelt.MultipartEncoderto handle multipart uploads of multiple files inupload_documnets.Temporary File Handling:
bulk_upload_documentscreates temporary text files using a utility functioncreate_txt_fileimported fromlibs.utils.file_utilsto simulate document uploads.API Endpoint URLs: Constructed dynamically with dataset, document, chunk, chat assistant, and session IDs using Python f-string formatting and string
.format().Batch Operations: Several functions support batch creation or deletion by looping calls or sending bulk payloads.
Error Handling: Minimal error handling is present; it mainly assumes API responses are JSON and returns them directly. Caller should handle exceptions or error codes.
Interaction with Other System Components
libs.utils.file_utils.create_txt_file: Used for generating temporary text files for upload testing.Environment Configuration: Reads
HOST_ADDRESSfrom environment variables to determine API server location.API Backend: This module is designed to work as a client to the InfiniFlow backend REST API. It assumes the backend implements the specified endpoints and JSON schemas.
Authentication: Requires external authentication management;
authparameter is passed into every function.Higher-level Applications: This module is likely imported and used by command-line tools, integration tests, or application components that need to manage datasets, documents, chunks, chat assistants, and sessions.
Visual Diagram
The following Mermaid class diagram visualizes the logical grouping of functions in common.py. Since the file is a utility module without classes, the diagram shows function groups as "utility classes" for clarity.
classDiagram
class DatasetManagement {
+create_dataset(auth, payload)
+list_datasets(auth, params)
+update_dataset(auth, dataset_id, payload)
+delete_datasets(auth, payload)
+batch_create_datasets(auth, num)
}
class DocumentManagement {
+upload_documnets(auth, dataset_id, files_path)
+download_document(auth, dataset_id, document_id, save_path)
+list_documnets(auth, dataset_id, params)
+update_documnet(auth, dataset_id, document_id, payload)
+delete_documnets(auth, dataset_id, payload)
+parse_documnets(auth, dataset_id, payload)
+stop_parse_documnets(auth, dataset_id, payload)
+bulk_upload_documents(auth, dataset_id, num, tmp_path)
}
class ChunkManagement {
+add_chunk(auth, dataset_id, document_id, payload)
+list_chunks(auth, dataset_id, document_id, params)
+update_chunk(auth, dataset_id, document_id, chunk_id, payload)
+delete_chunks(auth, dataset_id, document_id, payload)
+retrieval_chunks(auth, payload)
+batch_add_chunks(auth, dataset_id, document_id, num)
}
class ChatAssistantManagement {
+create_chat_assistant(auth, payload)
+list_chat_assistants(auth, params)
+update_chat_assistant(auth, chat_assistant_id, payload)
+delete_chat_assistants(auth, payload)
+batch_create_chat_assistants(auth, num)
}
class SessionManagement {
+create_session_with_chat_assistant(auth, chat_assistant_id, payload)
+list_session_with_chat_assistants(auth, chat_assistant_id, params)
+update_session_with_chat_assistant(auth, chat_assistant_id, session_id, payload)
+delete_session_with_chat_assistants(auth, chat_assistant_id, payload)
+batch_add_sessions_with_chat_assistant(auth, chat_assistant_id, num)
}
DatasetManagement --> DocumentManagement : manages documents
DocumentManagement --> ChunkManagement : manages chunks