conftest.py
Overview
The conftest.py file is a pytest configuration and fixture module designed for testing the InfiniFlow RAGFlow system, particularly focusing on dataset creation, document uploading, parsing, chunking, and chat assistant management through the HTTP API. This file provides reusable fixtures that set up and tear down test prerequisites such as temporary files, datasets, documents, chunks, and chat assistants. It also contains utility functions for waiting on asynchronous document processing and uses helper functions imported from other modules for API interactions and file creations.
Detailed Explanation
Imports and Dependencies
Standard Library:
time.sleep: Used to introduce delays.
Third-Party:
pytest: Testing framework providing fixture support.
Project Modules:
common: Contains batch operations and API helpers for datasets, documents, chat assistants.libs.auth: Provides HTTP API authentication class.utils: Utility functions including a wait decorator.utils.file_utils: Functions to create various test file formats.
Functions and Fixtures
1. condition(_auth, _dataset_id)
Purpose:
Polls the document list API for a given dataset until all documents have finished parsing (run status is"DONE").Parameters:
_auth (
RAGFlowHttpApiAuth): Authenticated HTTP API client instance._dataset_id (str): The ID of the dataset whose documents are checked.
Returns:
bool: ReturnsTrueif all documents are parsed;Falseotherwise.
Usage:
Used as a wait condition in tests to ensure asynchronous document parsing completes before proceeding.Decorator:
Wrapped with @wait_for(30, 1, "Document parsing timeout") which retries every 1 second for up to 30 seconds.
2. generate_test_files(request, tmp_path)
Type: pytest fixture (parameterized)
Scope: Function (default)
Purpose:
Generates one or more test files of various formats (docx,excel,ppt,image,pdf,txt,md,json,eml,html) in a temporary directory.Parameters:
request: pytestFixtureRequestobject to read parameter values.tmp_path: pytest built-in temporary directory path object.
Returns:
dict: A dictionary where keys are file types and values are the correspondingPathobjects of created files.
Behavior:
If the request.param matches a file type or is empty, the corresponding file is created.Usage Example:
@pytest.mark.parametrize("generate_test_files", ["pdf"], indirect=True) def test_pdf_upload(generate_test_files): pdf_file = generate_test_files["pdf"] # Use pdf_file in test...
3. ragflow_tmp_dir(request, tmp_path_factory)
Type: pytest fixture
Scope: Class
Purpose:
Creates a temporary directory specific to the test class.Parameters:
request: pytestFixtureRequestobject that provides the current test class name.tmp_path_factory: pytest factory to create temporary directories.
Returns:
Path: Temporary directory path for the class.
Usage:
Used for class-scoped tests requiring isolated filesystem workspace.
4. HttpApiAuth(token)
Type: pytest fixture
Scope: Session
Purpose:
Instantiates and provides an authenticated API client for the session.Parameters:
token(str): Authorization token (injected elsewhere in the test environment).
Returns:
RAGFlowHttpApiAuth: Authenticated HTTP API client instance.
Usage:
Provides API authorization for all API-interacting tests.
5. clear_datasets(request, HttpApiAuth)
Type: pytest fixture
Scope: Function
Purpose:
Cleans up all datasets after each test function.Parameters:
request: pytestFixtureRequestto register cleanup.HttpApiAuth: Authenticated API client.
Implementation:
Registers a finalizer that callsdelete_datasetswith no IDs (meaning delete all).
6. clear_chat_assistants(request, HttpApiAuth)
Type: pytest fixture
Scope: Function
Purpose:
Cleans up all chat assistants after each test function.Implementation:
Registers a finalizer callingdelete_chat_assistants.
7. clear_session_with_chat_assistants(request, HttpApiAuth, add_chat_assistants)
Type: pytest fixture
Scope: Function
Purpose:
Cleans up chat assistant sessions after tests.Parameters:
add_chat_assistants: Fixture creating chat assistants, providing their IDs.
Implementation:
Finalizer deletes sessions for each chat assistant ID.
8. add_dataset(request, HttpApiAuth)
Type: pytest fixture
Scope: Class
Purpose:
Creates a single dataset for tests at class scope and ensures cleanup.Returns:
str: Dataset ID.
Implementation:
Usesbatch_create_datasetsto create one dataset, deletes all datasets on teardown.
9. add_dataset_func(request, HttpApiAuth)
Type: pytest fixture
Scope: Function
Purpose:
Similar toadd_datasetbut scoped to each test function.Returns:
str: Dataset ID.
10. add_document(HttpApiAuth, add_dataset, ragflow_tmp_dir)
Type: pytest fixture
Scope: Class
Purpose:
Uploads a single document into the dataset and returns both dataset and document IDs.Returns:
Tuple[str, str]: (
dataset_id,document_id)
Implementation:
Usesbulk_upload_documentsto upload documents into a temporary directory.
11. add_chunks(HttpApiAuth, add_document)
Type: pytest fixture
Scope: Class
Purpose:
Parses the uploaded document, waits for completion, adds chunks to the document, and returns their IDs.Returns:
Tuple[str, str, List[str]]: (
dataset_id,document_id,chunk_ids)
Implementation Details:
Calls
parse_documentsAPI.Waits using
conditionto verify parsing completion.Adds 4 chunks via
batch_add_chunks.Sleeps 1 second to avoid timing issues.
12. add_chat_assistants(request, HttpApiAuth, add_document)
Type: pytest fixture
Scope: Class
Purpose:
Creates multiple chat assistants associated with a parsed document.Returns:
Tuple[str, str, List[str]]: (
dataset_id,document_id, list of chat assistant IDs)
Implementation:
Parses the document and waits for completion.
Creates 5 chat assistants via
batch_create_chat_assistants.Registers cleanup to delete all chat assistants after tests.
Important Implementation Details
The use of the
@wait_fordecorator onconditionensures robust waiting logic for asynchronous document parsing, retrying with interval and timeout.Cleanup logic is consistently implemented using pytest's
request.addfinalizerto maintain test isolation and prevent resource leakage.Temporary files for tests are dynamically generated using utility functions, supporting a broad range of document types to test ingestion and parsing capabilities.
The fixtures are hierarchically composed, e.g.,
add_chunksdepends onadd_document, which depends onadd_dataset.The subtle
sleep(1)after adding chunks addresses race condition or propagation delay issues (noted asissues/6487).
Interactions with Other Parts of the System
API Layer:
The file heavily interfaces with the HTTP API via theRAGFlowHttpApiAuthclient and API helper functions (batch_create_datasets,bulk_upload_documents, etc.) defined in thecommonmodule.File Utilities:
Usesutils.file_utilsto generate test files in various formats, supporting end-to-end document ingestion workflows.Test Framework:
Integrates with pytest for fixture management, parameterization, and test lifecycle hooks.Authentication:
Leverageslibs.auth.RAGFlowHttpApiAuthto manage authenticated requests seamlessly.Asynchronous Processing:
Useswait_fordecorator and polling logic to handle asynchronous document parsing before proceeding with dependent operations.
This file forms the foundational test setup for validating core functionalities of dataset and document handling in the InfiniFlow RAGFlow project.
Usage Workflow Summary
Setup:
Generate test files (optional, parameterized).
Create a temporary directory scoped by class or function.
Authenticate HTTP API client for session.
Dataset & Document:
Create datasets (class or function scoped).
Upload documents to datasets.
Processing:
Parse documents and wait for completion.
Add chunks to documents.
Chat Assistants:
Create chat assistants associated with documents.
Teardown:
Cleanup datasets, documents, chunks, chat assistants after tests.
Visual Diagram
flowchart TD
subgraph Fixtures
direction TB
GenerateTestFiles[generate_test_files]
RagflowTmpDir[ragflow_tmp_dir]
HttpApiAuth[HttpApiAuth]
ClearDatasets[clear_datasets]
ClearChatAssistants[clear_chat_assistants]
ClearSessions[clear_session_with_chat_assistants]
AddDataset[add_dataset]
AddDatasetFunc[add_dataset_func]
AddDocument[add_document]
AddChunks[add_chunks]
AddChatAssistants[add_chat_assistants]
end
GenerateTestFiles -->|creates| TestFiles
RagflowTmpDir -->|provides| TmpDir
AddDataset --> AddDocument
AddDocument --> AddChunks
AddDocument --> AddChatAssistants
HttpApiAuth --> ClearDatasets
HttpApiAuth --> ClearChatAssistants
HttpApiAuth --> ClearSessions
HttpApiAuth --> AddDataset
HttpApiAuth --> AddDatasetFunc
HttpApiAuth --> AddDocument
HttpApiAuth --> AddChunks
HttpApiAuth --> AddChatAssistants
ClearSessions --> AddChatAssistants
condition["condition()"]
AddChunks --> condition
AddChatAssistants --> condition
Summary
The conftest.py file is a critical pytest configuration module for InfiniFlow's RAGFlow testing suite. It provides a comprehensive set of fixtures for creating and cleaning up datasets, documents, chunks, and chat assistants, integrates asynchronous waiting for document parsing, and supports testing with diverse file types. Its design ensures modular, reusable, and isolated test environments for reliable automated testing of the RAGFlow backend services.