common.py
Overview
common.py is a utility module designed to facilitate interaction with a remote knowledge base and document management service exposed via a RESTful API. The file provides a set of functions to manage datasets (knowledge bases) and documents on a specified host server, abstracting the HTTP request details and simplifying common operations such as creating, listing, updating, and removing datasets, as well as uploading and parsing documents.
The module is primarily intended for use in applications that need to programmatically manage knowledge bases and their associated documents, enabling integration with the backend service without requiring the user to handle raw HTTP requests.
Constants
HOST_ADDRESS (
str):
The base URL of the remote service. Defaults to"http://127.0.0.1:9380"but can be overridden by setting the environment variableHOST_ADDRESS.DATASET_NAME_LIMIT (
int):
A constant defining the maximum allowed length for dataset names (128 characters). This limit is defined but not enforced within this file.
Functions
1. create_dataset(auth: str, dataset_name: str) -> dict
Creates a new dataset (knowledge base) on the remote server.
Parameters:
auth(str): Authorization token or header value required for API authentication.dataset_name(str): The name of the dataset to be created.
Returns:
A dictionary parsed from the JSON response of the server, typically containing status and dataset details.Usage Example:
response = create_dataset(auth="Bearer token123", dataset_name="MyDataset") print(response)
2. list_dataset(auth: str, page_number: int, page_size: int = 30) -> dict
Retrieves a paginated list of datasets available on the server.
Parameters:
auth(str): Authorization token.page_number(int): The page number to retrieve.page_size(int, optional): Number of datasets per page (default is 30).
Returns:
A dictionary containing the list of datasets and pagination info.Usage Example:
datasets = list_dataset(auth="Bearer token123", page_number=1, page_size=20) print(datasets)
3. rm_dataset(auth: str, dataset_id: str) -> dict
Removes (deletes) a dataset specified by its identifier.
Parameters:
auth(str): Authorization token.dataset_id(str): Unique identifier of the dataset to be removed.
Returns:
JSON response as a dictionary indicating success or failure.Usage Example:
result = rm_dataset(auth="Bearer token123", dataset_id="dataset_123") print(result)
4. update_dataset(auth: str, json_req: dict) -> dict
Updates dataset information based on the provided JSON request payload.
Parameters:
auth(str): Authorization token.json_req(dict): A dictionary containing fields and values to update for the dataset.
Returns:
JSON response as a dictionary with update status.Usage Example:
update_info = {"kb_id": "dataset_123", "name": "UpdatedName"} response = update_dataset(auth="Bearer token123", json_req=update_info) print(response)
5. upload_file(auth: str, dataset_id: str, path: str) -> dict
Uploads a file associated with a dataset.
Parameters:
auth(str): Authorization token.dataset_id(str): Identifier of the dataset to which the file will be uploaded.path(str): Local file path of the file to upload.
Returns:
JSON response from the server indicating success or failure of the upload.Important Note:
The file is opened in binary mode and sent as a multipart form data request.Usage Example:
response = upload_file(auth="Bearer token123", dataset_id="dataset_123", path="/path/to/file.pdf") print(response)
6. list_document(auth: str, dataset_id: str) -> dict
Lists all documents associated with a given dataset.
Parameters:
auth(str): Authorization token.dataset_id(str): Identifier of the dataset whose documents are to be listed.
Returns:
Dictionary containing document list and metadata.Usage Example:
docs = list_document(auth="Bearer token123", dataset_id="dataset_123") print(docs)
7. get_docs_info(auth: str, doc_ids: list) -> dict
Fetches detailed information about a list of document IDs.
Parameters:
auth(str): Authorization token.doc_ids(listofstr): List of document identifiers.
Returns:
Dictionary with detailed document metadata.Usage Example:
info = get_docs_info(auth="Bearer token123", doc_ids=["doc1", "doc2"]) print(info)
8. parse_docs(auth: str, doc_ids: list) -> dict
Triggers parsing and processing of specified documents on the server.
Parameters:
auth(str): Authorization token.doc_ids(listofstr): List of document IDs to be parsed.
Returns:
Server response indicating parsing status.Usage Example:
parse_result = parse_docs(auth="Bearer token123", doc_ids=["doc1", "doc2"]) print(parse_result)
9. parse_file(auth: str, document_id: str)
Description:
This function is declared but not implemented (passstatement). It appears intended to parse a single file/document by its ID, likely similar in function toparse_docsbut for individual files.Parameters:
auth(str): Authorization token.document_id(str): Identifier of the document to parse.
Returns:
None (not implemented).
Implementation Details
All functions communicate with a backend server using HTTP POST requests.
Authorization is consistently handled by passing an
"Authorization"header with each request.The backend API endpoints are derived from the base
HOST_ADDRESSwith specific paths such as/v1/kb/createfor dataset creation.The module uses the
requestslibrary for HTTP communication.File uploads are handled using multipart form data in the
upload_filefunction.Pagination is supported in
list_datasetvia query parameters.JSON responses from the server are parsed and returned as Python dictionaries.
Interaction with Other Parts of the System
This module acts as a client-side interface to the InfiniFlow knowledge base backend services.
It can be imported and used by higher-level components or services that need to manage datasets and documents.
It depends on environment configuration (
HOST_ADDRESS) to target the correct backend endpoint.It requires an authentication mechanism external to this file to provide valid tokens for API access.
The file does not include error handling, so it expects calling code to manage exceptions or response validations.
Diagram: Flowchart of Functions and Their Relationships
flowchart TD
A[Start] --> B[create_dataset]
A --> C[list_dataset]
A --> D[rm_dataset]
A --> E[update_dataset]
A --> F[upload_file]
A --> G[list_document]
A --> H[get_docs_info]
A --> I[parse_docs]
A --> J[parse_file (unimplemented)]
F --> G
G --> H
H --> I
Diagram Explanation:
The flowchart illustrates the module's main functions as independent API calls starting from a common entry point (user/application).
The upload and document-related functions (
upload_file,list_document,get_docs_info,parse_docs) form a logical chain representing document lifecycle operations.Dataset management functions (
create_dataset,list_dataset,rm_dataset,update_dataset) are independent but related to dataset administration.parse_fileis shown as unimplemented and isolated.
Summary
The common.py file is a concise and focused utility module for managing knowledge bases and documents via a REST API. Its simple function-based interface abstracts HTTP details, enabling easy integration into larger systems that require knowledge base operations or document processing workflows. To maximize robustness, future enhancements could include error handling, input validation, and implementing currently stubbed functions like parse_file.