conftest.py
Overview
The conftest.py file serves as a centralized configuration and fixture provider for pytest-based automated testing within the InfiniFlow project. It defines a collection of pytest fixtures and utility functions that facilitate setting up, managing, and tearing down test resources related to datasets, documents, chat assistants, and file artifacts. These fixtures help standardize test preparation steps such as:
Generating various test files of different formats (e.g., DOCX, PDF, JSON).
Creating and cleaning up datasets, documents, chunks, and chat assistants.
Managing asynchronous workflows such as document parsing.
Providing reusable client API instances for interacting with the RAGFlow backend.
The file relies heavily on the pytest framework's fixture mechanism and integrates with the RAGFlow SDK, utility modules, and common batch operations to streamline test setup and cleanup.
Detailed Descriptions
Functions and Fixtures
condition(_dataset: DataSet) -> bool
A helper function decorated with the @wait_for retry wrapper that checks if all documents within a given dataset have finished processing.
Parameters:
_dataset(DataSet): The dataset object whose documents' status is checked.
Returns:
bool: ReturnsTrueif every document's .run status is"DONE", otherwiseFalse.
Usage:
This function is used as a wait condition to poll until document parsing is complete before proceeding with dependent test logic.
generate_test_files(request: FixtureRequest, tmp_path: Path) -> dict[str, Path]
Pytest fixture that generates multiple types of test files in a temporary directory. The fixture selectively creates files based on the parameter passed to it.
Parameters:
Returns:
dict[str, Path]: A mapping of file types (e.g.,"pdf","txt") to their corresponding file paths.
Usage:
Test cases can parametrize this fixture to generate required test file types on demand for document upload or parsing tests.
ragflow_tmp_dir(request: FixtureRequest, tmp_path_factory: Path) -> Path
Pytest fixture creating a uniquely named temporary directory scoped to the test class.
Parameters:
request (
FixtureRequest): Provides access to the requesting test context.tmp_path_factory (
Path): Factory to create temporary paths.
Returns:
Path: The path of the newly created temporary directory named after the test class.
Usage:
Provides isolated filesystem workspace per test class to avoid conflicts during file operations.
client(token: str) -> RAGFlow
Session-scoped pytest fixture that instantiates an authenticated RAGFlow client API object.
Parameters:
token (
str): API key token for authorization.
Returns:
RAGFlow: Configured RAGFlow client instance pointing to the test server.
Usage:
Used by tests needing to interact programmatically with the backend API.
clear_datasets(request: FixtureRequest, client: RAGFlow)
Function-scoped fixture that ensures deletion of all datasets after a test completes.
Parameters:
request (
FixtureRequest): Used to register cleanup finalizer.client(RAGFlow): The API client.
Returns:
None
Implementation Detail:
Registers a finalizer that calls client.delete_datasets(ids=None) to remove all datasets.
clear_chat_assistants(request: FixtureRequest, client: RAGFlow)
Function-scoped fixture that cleans up all chat assistants after a test.
Parameters:
request (
FixtureRequest)client(RAGFlow)
Returns:
None
Implementation Detail:
Finalizer calls client.delete_chats(ids=None) to delete all chat assistants.
clear_session_with_chat_assistants(request, add_chat_assistants)
Function-scoped fixture that deletes all sessions associated with chat assistants after a test.
Parameters:
add_chat_assistants(tuple): Provides a tuple with datasets, documents, and chat assistants.
Returns:
None
Implementation Detail:
Iterates over chat assistants and attempts to delete sessions, catching exceptions silently to avoid test failures.
add_dataset(request: FixtureRequest, client: RAGFlow) -> DataSet
Class-scoped fixture that creates a new dataset for tests and ensures its deletion afterward.
Parameters:
request (
FixtureRequest)client(RAGFlow)
Returns:
DataSet: The newly created dataset.
Implementation Detail:
Uses batch_create_datasets to create one dataset, then registers a finalizer to delete all datasets.
add_dataset_func(request: FixtureRequest, client: RAGFlow) -> DataSet
Function-scoped version of add_dataset fixture, providing dataset setup and teardown per test function.
Parameters:
request (
FixtureRequest)client(RAGFlow)
Returns:
DataSet
add_document(add_dataset: DataSet, ragflow_tmp_dir: Path) -> tuple[DataSet, Document]
Class-scoped fixture that uploads a document to a dataset and returns both.
Parameters:
add_dataset(DataSet): Dataset to which document is uploaded.ragflow_tmp_dir(Path): Temporary directory for document storage.
Returns:
Tuple of:
DataSetDocument
Implementation Detail:
Uses bulk_upload_documents to upload a single document.
add_chunks(request: FixtureRequest, add_document: tuple[DataSet, Document]) -> tuple[DataSet, Document, list[Chunk]]
Class-scoped fixture that adds chunks to an uploaded document after ensuring document parsing is complete.
Parameters:
request (
FixtureRequest)add_document(tuple): Dataset and document tuple.
Returns:
Tuple of:
DataSetDocumentlist[Chunk]: List of created chunks.
Implementation Details:
Registers a finalizer to delete chunks after test.
Initiates asynchronous parsing of documents.
Waits for parsing completion using
condition.Calls batch_add_chunks to add 4 chunks.
Includes a 1-second sleep to address known issue #6487.
add_chat_assistants(request, client, add_document) -> tuple[DataSet, Document, list[Chat]]
Class-scoped fixture to create multiple chat assistants associated with a document.
Parameters:
client(RAGFlow)add_document(tuple): Dataset and document.
Returns:
Tuple of:
DataSetDocumentlist[Chat]: List of created chat assistants.
Implementation Details:
Registers cleanup to delete all chats after tests.
Waits for document parsing completion before creating chat assistants.
Important Implementation Details
Asynchronous Document Parsing Management:
Theconditionfunction combined with the @wait_for decorator provides polling logic to wait (up to 30 seconds, polling every 1 second) until all documents in a dataset are fully parsed (status"DONE"). This ensures dependent tests do not proceed prematurely.Batch Operations:
The file utilizes batch functions (batch_create_datasets, batch_add_chunks,batch_create_chat_assistants, etc.) imported from common to efficiently create multiple entities in one call, improving test speed and consistency.Resource Cleanup via Finalizers:
Most fixtures register finalizers using request.addfinalizer to perform teardown operations, such as deleting datasets, chats, or chunks. This pattern ensures test isolation and prevents resource leakage.Parameterized Test File Creation:
Thegenerate_test_filesfixture dynamically creates specific test file types based on input parameters, supporting flexible test scenarios involving document ingestion.Temporary Directory Management:
Unique temporary directories are created per test class using tmp_path_factory.mktemp to avoid conflicts between tests.
Interaction with Other System Components
RAGFlow SDK:
The fixtures interact closely with the RAGFlow SDK classes such asRAGFlow,DataSet,Document,Chunk, andChatto manipulate core domain entities during tests.Utility Modules:
File creation functions (e.g.,create_docx_file,create_pdf_file) are imported from utils.file_utils to generate sample documents for upload tests.Common Batch Helpers:
Batch operations are imported from a common module, abstracting repetitive API calls into reusable functions for creating datasets, chat assistants, chunks, and uploading documents.Configuration Constants:
HOST_ADDRESS and VERSION constants are imported from configs to configure the RAGFlow client connections.Testing Framework (pytest):
Relies on pytest's fixture and parameterization system extensively for modular and maintainable test setup.
Usage Examples
Example of using the add_chunks fixture in a test (pytest style):
def test_chunk_processing(add_chunks):
dataset, document, chunks = add_chunks
assert len(chunks) == 4
for chunk in chunks:
assert chunk.text is not None
Example of parameterizing generate_test_files fixture:
@pytest.mark.parametrize("generate_test_files", ["pdf"], indirect=True)
def test_pdf_upload(generate_test_files):
pdf_path = generate_test_files["pdf"]
# Use pdf_path to upload and test
Mermaid Flowchart Diagram
The following flowchart illustrates the key fixtures and their dependencies showing how test resources are composed from base elements and cleaned up after tests.
flowchart TD
A[RAGFlow Client (client)] --> B[Datasets (add_dataset)]
B --> C[Documents (add_document)]
C --> D[Chunks (add_chunks)]
C --> E[Chat Assistants (add_chat_assistants)]
F[generate_test_files] --> C
G[Temporary Directory (ragflow_tmp_dir)] --> C
subgraph Cleanup
B -.->|delete_datasets| H[Dataset Cleanup]
E -.->|delete_chats| I[Chat Assistants Cleanup]
D -.->|delete_chunks| J[Chunk Cleanup]
end
C -.->|async_parse_documents + wait_for condition| D
C -.->|async_parse_documents + wait_for condition| E
Summary
conftest.py is a critical infrastructure file for setting up the testing environment in the InfiniFlow project. It abstracts the complexity of preparing datasets, documents, chunks, and chat assistants while ensuring proper cleanup and synchronization. By leveraging pytest fixtures, batch operations, and utility functions, it enables robust, repeatable, and isolated tests that interact with the RAGFlow backend and file system in a controlled manner.