conftest.py
Overview
The conftest.py file serves as a configuration and fixture provider for pytest-based testing within the InfiniFlow project. Its primary purpose is to define reusable pytest fixtures and utility functions that facilitate automated testing of document parsing and chunk management workflows using the ragflow_sdk.
Specifically, this file provides:
A custom wait condition to verify that document parsing has completed.
A pytest fixture that adds chunks to a document, ensuring that the document has been parsed before chunk addition.
Cleanup logic to remove chunks after tests complete, maintaining test isolation and environment consistency.
This file acts as a bridge between the test cases and the underlying SDK operations, abstracting common setup and teardown steps for document chunk management.
Detailed Explanation
Imports and Dependencies
time.sleep: Used for introducing a delay to handle timing-related issues.pytest: Core testing framework used for fixtures and test execution.common.batch_add_chunks: Utility function to add chunks to a document in batches.pytest.FixtureRequest: Type hint for pytest fixture request object.ragflow_sdk: SDK providingChunk,DataSet, andDocumentclasses to interact with document data.utils.wait_for: Decorator utility that retries a condition for a specified timeout period.
Functions and Fixtures
1. condition(_dataset: DataSet) -> bool
A condition function decorated with @wait_for that waits up to 30 seconds, polling every 1 second, for all documents in a dataset to reach the "DONE" run status, indicating that parsing is complete.
Parameters:
_dataset (
DataSet): The dataset object whose documents are to be checked.
Returns:
bool: Returns True if all documents haverun == "DONE", otherwiseFalse.
Usage:
The function is used as a wait condition inadd_chunks_functo ensure that document parsing completes before chunks are added.Implementation Detail:
The function lists up to 1000 documents in the dataset and iterates over them, returningFalseimmediately if any document is not yet done. The@wait_fordecorator uses this condition to retry until the timeout or success.
2. add_chunks_func(request: FixtureRequest, add_document: tuple[DataSet, Document]) -> tuple[DataSet, Document, list[Chunk]]
A pytest fixture with function scope that:
Accepts another fixture
add_documentwhich provides a tuple of(DataSet, Document).Initiates asynchronous parsing of the document.
Waits for parsing to complete using the
conditionfunction.Adds chunks to the parsed document using the
batch_add_chunksutility.Sleeps for 1 second (a workaround for known issue #6487).
Cleans up by deleting added chunks after the test completes.
Parameters:
request (
FixtureRequest): Built-in pytest fixture providing information about the requesting test context.add_document(tuple[DataSet, Document]): Fixture providing dataset and document objects.
Returns:
tuple[DataSet, Document, list[Chunk]]: The dataset, document, and list of chunks added.
Usage Example:
def test_chunk_processing(add_chunks_func):
dataset, document, chunks = add_chunks_func
# Perform assertions or operations with chunks
assert len(chunks) == 4
Implementation Details:
Registers a
cleanupfinalizer to delete all chunks from the document after the test finishes, ensuring no side effects.Uses
dataset.async_parse_documentsto start parsing.Relies on the
conditionfunction to block until parsing is done.Calls
batch_add_chunks(fromcommon) to add exactly 4 chunks.Sleeps for 1 second to mitigate timing issues (noted as issue #6487).
Important Implementation Details
Wait and Retry Logic:
The use of the@wait_fordecorator onconditionimplements a polling mechanism that retries the parsing completion check at 1-second intervals, up to 30 seconds. This design handles asynchronous document parsing without blocking indefinitely.Test Isolation:
Thecleanupfunction ensures chunks added during a test do not persist afterward. This is crucial for preventing test interdependence and maintaining a clean test environment.Sleeping Workaround:
The explicitsleep(1)call is a temporary fix for a known issue (referenced as issue #6487), indicating possible race conditions or eventual consistency delays in chunk addition or parsing.
Interaction with Other System Components
ragflow_sdkClasses:DataSet: Represents a collection of documents.Document: Represents a document that can be parsed and chunked.Chunk: Represents chunks derived from documents.
Fixtures:
Relies on another fixture
add_document(not defined in this file) which must provide a dataset and a document.
Utility Functions:
batch_add_chunks: Adds chunks in batches to a document.wait_for: Decorator enabling retry/wait logic.
Testing Framework:
Integrates tightly with
pytest, using fixtures and finalizers to manage setup and teardown.
This file is foundational for tests that require documents to be fully parsed and chunked before executing test logic.
Mermaid Diagram
flowchart TD
A[add_chunks_func Fixture] -->|uses| B[add_document Fixture]
A -->|calls| C[dataset.async_parse_documents([document.id])]
A -->|waits for| D[condition(dataset)]
D -->|loops over| E[dataset.list_documents()]
E -->|checks| F[document.run == "DONE"]
A -->|calls| G[batch_add_chunks(document, 4)]
A -->|registers cleanup| H[document.delete_chunks(ids=[])]
A -->|sleeps| I[sleep(1)]
subgraph "Decorators"
D -.-> J[@wait_for(30, 1, "Document parsing timeout")]
end
Summary
The conftest.py file is a pytest configuration module that provides utility fixtures and condition checks necessary for testing document parsing and chunk addition workflows in the InfiniFlow project. It abstracts asynchronous parsing synchronization, chunk batch addition, and cleanup logic, enabling reliable and isolated testing of document-related features.