ragflow.py
Overview
ragflow.py defines the RAGFlow class, which serves as a Python client SDK for interacting with the InfiniFlow backend API. The primary purpose of this file is to provide an abstraction layer for making HTTP requests to the API endpoints related to datasets, chats, agents, and document retrieval functionalities.
This class encapsulates RESTful operations (POST, GET, PUT, DELETE) and exposes high-level methods for managing:
DataSets: creating, listing, retrieving, and deleting datasets.
Chats: creating, listing, and deleting chat sessions.
Agents: creating, updating, listing, and deleting intelligent agents.
Retrieval: retrieving relevant document chunks from datasets based on a query.
The file uses several domain entities imported from sibling modules (Agent, Chat, Chunk, DataSet) to represent and manipulate the data returned by the backend.
Classes and Methods
Class: RAGFlow
Purpose
Acts as an API client to communicate with the InfiniFlow backend, managing knowledge datasets, chat sessions, retrieval operations, and intelligent agents.
Initialization
def __init__(self, api_key, base_url, version="v1")
Parameters:
api_key(str): The authentication token for API access.base_url(str): The base URL of the backend server (e.g.,http://<host_address>).version(str, optional): API version, defaults to"v1".
Description:
Initializes the client with authorization headers and constructs the full API URL.Usage:
ragflow = RAGFlow(api_key="your_api_key", base_url="http://localhost:8000")
HTTP Request Methods
These methods wrap the requests library calls with appropriate headers and URL formatting.
def post(self, path, json=None, stream=False, files=None)
Sends a POST request.
Returns the raw
requests.Responseobject.
def get(self, path, params=None, json=None)
Sends a GET request.
Returns the raw
requests.Responseobject.
def delete(self, path, json)
Sends a DELETE request.
Returns the raw
requests.Responseobject.
def put(self, path, json)
Sends a PUT request.
Returns the raw
requests.Responseobject.
Dataset Management
create_dataset
def create_dataset(self, name: str, avatar: Optional[str] = None, description: Optional[str] = None, embedding_model: Optional[str] = None, permission: str = "me", chunk_method: str = "naive", parser_config: Optional[DataSet.ParserConfig] = None) -> DataSet:
Parameters:
name(str): Name of the dataset.avatar(Optional[str]): URL or path to an avatar image.description(Optional[str]): Description of the dataset.embedding_model(Optional[str]): Model used for embedding generation.permission(str): Permission scope, defaults to"me".chunk_method(str): Method used for chunking text, defaults to"naive".parser_config(Optional[DataSet.ParserConfig]): Optional parser configuration.
Returns:
DataSetinstance representing the newly created dataset.Exceptions:
RaisesExceptionif backend returns an error.Usage Example:
dataset = ragflow.create_dataset( name="MyDataset", description="A sample knowledge base", embedding_model="text-embedding-ada-002" )
delete_datasets
def delete_datasets(self, ids: list[str] | None = None)
Deletes datasets by their IDs.
Parameters:
ids(list of str or None): List of dataset IDs to delete. IfNone, no ids are specified.Raises an exception if deletion fails.
get_dataset
def get_dataset(self, name: str) -> DataSet
Retrieves a dataset by name.
Returns a single
DataSetobject.Raises
Exceptionif not found.
list_datasets
def list_datasets(self, page: int = 1, page_size: int = 30, orderby: str = "create_time", desc: bool = True, id: str | None = None, name: str | None = None) -> list[DataSet]
Lists datasets with pagination and filtering.
Returns a list of
DataSetinstances.Raises
Exceptionon error.
Chat Management
create_chat
def create_chat(self, name: str, avatar: str = "", dataset_ids=None, llm: Chat.LLM | None = None, prompt: Chat.Prompt | None = None) -> Chat
Parameters:
name(str): Chat session name.avatar(str): Avatar for the chat.dataset_ids(list or None): List of dataset IDs to associate.llm(Chat.LLM or None): Language model configuration.prompt(Chat.Prompt or None): Prompt configuration.
Returns:
AChatinstance representing the created chat session.Details:
Ifllmorpromptparameters are omitted, default configurations are created internally.Usage Example:
chat = ragflow.create_chat(name="SupportChat", dataset_ids=[dataset.id])
delete_chats
def delete_chats(self, ids: list[str] | None = None)
Deletes chats by IDs.
Raises exception on failure.
list_chats
def list_chats(self, page: int = 1, page_size: int = 30, orderby: str = "create_time", desc: bool = True, id: str | None = None, name: str | None = None) -> list[Chat]
Returns a paginated list of
Chatobjects.
Retrieval
retrieve
def retrieve(self, dataset_ids, document_ids=None, question="", page=1, page_size=30, similarity_threshold=0.2, vector_similarity_weight=0.3, top_k=1024, rerank_id: str | None = None, keyword: bool = False, cross_languages: list[str]|None = None, metadata_condition: dict | None = None)
Parameters:
dataset_ids(list): List of dataset IDs to search in.document_ids(list or None): Optional document filtering.question(str): Query text.page(int): Pagination page number.page_size(int): Number of items per page.similarity_threshold(float): Cutoff for similarity.vector_similarity_weight(float): Weight for vector similarity in ranking.top_k(int): Max number of results.rerank_id(str or None): Optional rerank model ID.keyword(bool): Whether to use keyword search.cross_languages(list or None): List of language codes for cross-lingual search.metadata_condition(dict or None): Metadata filters.
Returns:
List ofChunkobjects matching the query.Details:
Sends a POST request to/retrievalendpoint with filtering and ranking parameters.
Agent Management
list_agents
def list_agents(self, page: int = 1, page_size: int = 30, orderby: str = "update_time", desc: bool = True, id: str | None = None, title: str | None = None) -> list[Agent]
Lists agents with pagination and filters.
Returns a list of
Agentobjects.
create_agent
def create_agent(self, title: str, dsl: dict, description: str | None = None) -> None
Creates a new agent.
title: Agent's name.dsl: Domain-specific language (DSL) dict defining agent behavior.Raises exception on failure.
update_agent
def update_agent(self, agent_id: str, title: str | None = None, description: str | None = None, dsl: dict | None = None) -> None
Updates agent attributes.
Any subset of
title,description, ordslcan be updated.
delete_agent
def delete_agent(self, agent_id: str) -> None
Deletes an agent by ID.
Implementation Details and Algorithms
REST Client Abstraction:
The class abstracts HTTP methods (post,get,put,delete) with consistent authorization headers.Error Handling:
API responses are expected to have a JSON body containing a"code"key. A value of0indicates success; otherwise, an exception is raised with the backend's error message.Data Modeling:
The class converts raw JSON responses into domain objects (DataSet,Chat,Agent,Chunk) which encapsulate additional logic in their respective modules.Default Configuration:
For chat creation, default language model and prompt configurations are constructed if none are provided, ensuring ease of use.Flexible Retrieval:
Theretrievemethod supports advanced filtering and ranking parameters, including similarity thresholds, reranking, keyword search, cross-language retrieval, and metadata filtering.
Interaction with Other Modules
Imports from
modulespackage:Agent: Represents intelligent agents, their properties, and behaviors.Chat: Models chat sessions, including LLM configurations and prompts.Chunk: Represents portions of documents retrieved as relevant knowledge.DataSet: Represents knowledge datasets with parser configurations.
RAGFlowacts as the coordinator, using these entities to instantiate objects from API responses and to send properly formatted requests.The file relies on the standard
requestslibrary for HTTP communication.
Usage Summary
from ragflow import RAGFlow
# Initialize client
ragflow = RAGFlow(api_key="your_api_key", base_url="http://localhost:8000")
# Create dataset
dataset = ragflow.create_dataset(name="My Knowledge Base")
# Create chat linked to dataset
chat = ragflow.create_chat(name="Support Chat", dataset_ids=[dataset.id])
# Retrieve relevant chunks from dataset
chunks = ragflow.retrieve(dataset_ids=[dataset.id], question="What is the refund policy?")
# List agents
agents = ragflow.list_agents()
# Create an agent
ragflow.create_agent(title="SupportBot", dsl={"type": "faq_bot", "config": {}})
Visual Diagram
classDiagram
class RAGFlow {
- user_key: str
- api_url: str
- authorization_header: dict
+ __init__(api_key, base_url, version="v1")
+ post(path, json=None, stream=False, files=None)
+ get(path, params=None, json=None)
+ delete(path, json)
+ put(path, json)
+ create_dataset(name, avatar=None, description=None, embedding_model=None, permission="me", chunk_method="naive", parser_config=None) DataSet
+ delete_datasets(ids)
+ get_dataset(name) DataSet
+ list_datasets(page=1, page_size=30, orderby="create_time", desc=True, id=None, name=None) list~DataSet~
+ create_chat(name, avatar="", dataset_ids=None, llm=None, prompt=None) Chat
+ delete_chats(ids)
+ list_chats(page=1, page_size=30, orderby="create_time", desc=True, id=None, name=None) list~Chat~
+ retrieve(dataset_ids, document_ids=None, question="", page=1, page_size=30, similarity_threshold=0.2, vector_similarity_weight=0.3, top_k=1024, rerank_id=None, keyword=False, cross_languages=None, metadata_condition=None) list~Chunk~
+ list_agents(page=1, page_size=30, orderby="update_time", desc=True, id=None, title=None) list~Agent~
+ create_agent(title, dsl, description=None)
+ update_agent(agent_id, title=None, description=None, dsl=None)
+ delete_agent(agent_id)
}
RAGFlow --> DataSet : creates/manages
RAGFlow --> Chat : creates/manages
RAGFlow --> Chunk : retrieves
RAGFlow --> Agent : creates/manages
Summary
ragflow.py implements a comprehensive client interface for the InfiniFlow backend API, focusing on knowledge dataset management, chat session handling, retrieval of relevant document chunks, and intelligent agent lifecycle management. It encapsulates HTTP communication, error handling, and domain entity manipulation, enabling developers to build applications that leverage InfiniFlow's retrieval-augmented generation capabilities with ease.