conftest.py
Overview
conftest.py is a Pytest configuration and fixture file designed to support testing workflows related to document processing within the InfiniFlow system. It provides reusable fixtures and utility conditions that automate common setup and teardown tasks during tests, specifically focusing on adding and managing "chunks" of data within documents and verifying their processing state.
The primary functionality includes:
Polling and waiting for document processing completion.
Adding chunks to documents in a controlled test environment.
Ensuring cleanup of added test data after test execution.
This file integrates with core testing utilities and APIs exposed by the InfiniFlow project to enable reliable, repeatable, and isolated test cases involving document chunk management.
Detailed Description
Imports and Dependencies
time.sleep: Used to introduce delays, particularly after chunk additions to avoid race conditions.pytest: Pytest framework for defining fixtures and test utilities.Imported functions from common module:
batch_add_chunks: Adds multiple data chunks to a document.delete_chunks: Removes chunks from a document.list_documents: Retrieves document metadata and status.parse_documents: Triggers document parsing operations.
Imported decorator from
utils:wait_for: A retry decorator that waits for a condition to be met within a timeout.
Functions and Fixtures
condition(_auth, _dataset_id)
@wait_for(30, 1, "Document parsing timeout")
def condition(_auth, _dataset_id) -> bool:
Purpose: Polls document statuses in a dataset to determine if all documents have completed their processing phase (
runstatus is"DONE").Parameters:
_auth: Authentication token or object required for API calls._dataset_id: Identifier of the dataset containing documents.
Returns:
Trueif all documents have status"DONE", elseFalse.Behavior:
Uses
list_documentsto fetch documents.Iterates through documents checking their
runstatus.Returns
Falseif any document is not done.
Decorator:
@wait_for(30, 1, "Document parsing timeout")means it retries this condition every 1 second for up to 30 seconds, raising a timeout error if condition not met.
Usage Example:
# Wait until all documents in dataset are done processing
condition(auth_token, dataset_id)
add_chunks_func(request, HttpApiAuth, add_document)
@pytest.fixture(scope="function")
def add_chunks_func(request, HttpApiAuth, add_document):
Purpose: Pytest fixture that prepares a test dataset and document with chunks added, ensuring cleanup after the test.
Scope: Function-level (runs once per test function).
Parameters:
request: Pytest internal fixture providing test context and finalizer functionality.HttpApiAuth: Authentication context for API calls.add_document: Another fixture presumed to create and add a document, returning(dataset_id, document_id).
Returns: Tuple
(dataset_id, document_id, chunk_ids)where:dataset_id: ID of the dataset containing the document.document_id: ID of the document to which chunks were added.chunk_ids: List of IDs of the chunks added.
Implementation Details:
Defines a
cleanup()function that deletes all chunks added during the test.Registers
cleanup()as a finalizer to ensure chunks are deleted after the test to keep test environment clean.Uses the
add_documentfixture to create a document.Calls
parse_documentsto trigger parsing for the newly added document.Calls
conditionto wait until parsing is complete.Adds 4 chunks to the document using
batch_add_chunks.Sleeps for 1 second as a workaround for race condition issues (referenced as issue #6487).
Usage Example:
In test code, you can use this fixture as follows:
def test_chunk_processing(add_chunks_func):
dataset_id, document_id, chunk_ids = add_chunks_func
# Proceed with tests that require chunks to be present
Important Implementation Details
Polling with Timeout: The
conditionfunction uses the customwait_fordecorator to implement polling with a timeout, ensuring tests do not proceed until document parsing is confirmed complete or timeout occurs.Test Data Cleanup: The fixture uses
request.addfinalizerto guarantee cleanup of any chunks added during tests, preventing side effects across tests.Race Condition Handling: A fixed 1-second delay (
sleep(1)) after chunk addition is introduced to mitigate known race conditions (issue #6487), ensuring downstream processes have time to register changes.Fixture Composition: This fixture depends on another fixture
add_document, showing modular fixture design promoting code reuse.
Interaction with Other System Components
Common Module: Provides core functions to manipulate documents and chunks, which this file uses to perform setup and teardown.
Utils Module: Supplies the
wait_fordecorator used for implementing retry logic.Test Suite: This file acts as a backbone for tests that require document chunk manipulations, supplying ready-to-use fixtures and conditions.
API Layer: The
HttpApiAuthparameter indicates reliance on HTTP API authentication, suggesting these tests interact with a RESTful or similar API for document management.
Mermaid Flowchart Diagram of Function Relationships
flowchart TD
A[add_chunks_func fixture]
B[cleanup finalizer]
C[add_document fixture]
D[parse_documents]
E[condition]
F[list_documents]
G[batch_add_chunks]
H[delete_chunks]
I[sleep(1)]
A -->|depends on| C
A --> D
A --> E
A --> G
A --> I
A --> B
B --> H
E --> F
Diagram Explanation:
The
add_chunks_funcfixture depends on theadd_documentfixture.It triggers parsing via
parse_documents.It waits for parsing to complete by calling
condition, which queries documents vialist_documents.It adds chunks using
batch_add_chunks.It includes a sleep delay to address race conditions.
Registers a cleanup function that deletes chunks using
delete_chunks.
Summary
conftest.py is a utility and fixture configuration file that facilitates testing of document chunk processing in the InfiniFlow system. It encapsulates common test setup patterns such as waiting for document parsing to complete, adding multiple chunks, and cleaning up afterward. The file leverages modular fixtures and retry decorators to provide robust and maintainable test infrastructure, integrating tightly with the system's API and common utility modules.