dataset_example.py
Overview
dataset_example.py is a simple example script demonstrating CRUD (Create, Read, Update, Delete) operations on a dataset using the RAGFlow SDK. It showcases how to instantiate the RAGFlow client, create a dataset, update its properties, list or print the dataset details, and finally delete the dataset. This example is intended to illustrate the basic usage pattern of the RAGFlow SDK’s dataset management capabilities.
Detailed Explanation
Imports and Constants
from ragflow_sdk import RAGFlow
Imports the main SDK classRAGFlowwhich is used to interact with the API.import sys
Used for exiting the script with appropriate status codes.HOST_ADDRESS
The base URL for the RAGFlow API endpoint. Here it is set to localhost.API_KEY
API key string used for authentication with the RAGFlow service.
Main Flow
The script is wrapped in a try-except block to handle exceptions gracefully.
1. Creating a RAGFlow instance
ragflow_instance = RAGFlow(api_key=API_KEY, base_url=HOST_ADDRESS)
Purpose:
Creates a client instance to communicate with the RAGFlow API.Parameters:
api_key(str): Authentication token for API access.base_url(str): The API server address.
Usage:
This instance is used to perform all subsequent dataset operations.
2. Creating a Dataset
dataset_instance = ragflow_instance.create_dataset(name="dataset_instance")
Purpose:
Creates a new dataset resource with the specified name.Parameters:
name(str): Name of the dataset to create.
Returns:
An object representing the newly created dataset, stored here asdataset_instance.Usage:
This object allows further operations on the dataset like update, delete, or retrieve.
3. Updating the Dataset
updated_message = {"name": "updated_dataset"}
updated_dataset = dataset_instance.update(updated_message)
Purpose:
Updates properties of the existing dataset. Here the name is updated.Parameters:
updated_message(dict): Key-value pairs representing the fields to update.
Returns:
The updated dataset object reflecting the applied changes.Usage Example:
To change the dataset's name or other metadata fields.
4. Reading (Printing) Dataset Information
print(dataset_instance)
print(updated_dataset)
Purpose:
Prints the details of the original and updated dataset objects.Usage:
Useful for verifying the state of the dataset before and after update operations.
5. Deleting the Dataset
to_be_deleted_datasets = [dataset_instance.id]
ragflow_instance.delete_datasets(ids=to_be_deleted_datasets)
Purpose:
Deletes one or more datasets by their IDs.Parameters:
ids(list of str): List of dataset IDs to be deleted.
Usage:
Cleans up the created dataset to avoid leftover test data.
6. Exit Handling
If all operations succeed, prints
"test done"and exits with code 0.If an exception occurs, prints the error string and exits with code -1.
Important Implementation Details
The script assumes the RAGFlow SDK handles HTTP requests internally with proper authentication using the API key.
Dataset CRUD operations are executed synchronously.
The dataset update operation takes a dictionary, allowing flexible updates to multiple fields if needed.
Dataset deletion accepts a list, enabling batch removal.
Error handling is generic, capturing and printing any exception raised.
Interaction with Other Parts of the System
ragflow_sdkpackage:
This file depends on the external SDKragflow_sdkwhich must be installed and accessible. The SDK abstracts the API communication with the RAGFlow backend service.RAGFlow API Server:
The script interacts with the RAGFlow API server at the specifiedHOST_ADDRESS. The server performs dataset management operations on the backend.Datasets:
The dataset is a core entity managed by RAGFlow. This script exemplifies basic lifecycle management of datasets.
Usage Example
Run the script directly after setting proper HOST_ADDRESS and API_KEY values:
python dataset_example.py
Expected output (simplified):
<Dataset object at 0x... with name 'dataset_instance'>
<Dataset object at 0x... with name 'updated_dataset'>
test done
If any error occurs (e.g., network or authentication failure), the error message will be printed and the script will exit with code -1.
Mermaid Class Diagram
The following diagram represents the structure and main interactions in this file. The primary classes involved are RAGFlow and the dataset object returned by create_dataset().
classDiagram
class RAGFlow {
+__init__(api_key: str, base_url: str)
+create_dataset(name: str) Dataset
+delete_datasets(ids: list)
}
class Dataset {
+id: str
+name: str
+update(update_fields: dict) Dataset
+__str__()
}
RAGFlow --o Dataset : creates
Dataset --> Dataset : update()
Summary
The dataset_example.py file is a straightforward demonstration of how to perform CRUD operations on datasets using the RAGFlow SDK. It is ideal for developers who want to understand the basic API usage patterns for managing datasets in the InfiniFlow ecosystem. The clean, stepwise operations with exception handling make it a good starting point for integration or testing purposes.