test_retrieval_chunks.py
Overview
This file contains a suite of automated tests designed to validate the functionality of the retrieval_chunks API endpoint in the InfiniFlow system. The tests focus on verifying authorization, parameter handling, pagination, filtering, and concurrency aspects of chunk retrieval from datasets or documents.
The tests are implemented using the pytest framework and include multiple parameterized test cases to cover a wide range of inputs and expected outputs. This ensures robustness and correctness of the chunk retrieval API under different scenarios, including error handling for invalid inputs, and concurrent access.
Detailed Explanation
Imported Modules
os: Used to retrieve environment variables for conditional test skipping.concurrent.futures.ThreadPoolExecutor and
as_completed: For executing concurrent retrieval tests.pytest: Testing framework used for writing and running tests.retrieval_chunks(fromcommon): The function under test, responsible for retrieving chunks based on parameters.INVALID_API_TOKEN(fromconfigs): A constant representing an invalid API token used in authorization tests.RAGFlowHttpApiAuth(fromlibs.auth): Class representing HTTP API authorization credentials.
Classes and Their Methods
TestAuthorization
Tests related to API authorization.
Methods
test_invalid_auth(self, invalid_auth, expected_code, expected_message)Tests the behavior of the
retrieval_chunksfunction when provided with invalid or missing authorization.Parameters:
invalid_auth: Either None or an instance ofRAGFlowHttpApiAuthinitialized with an invalid token.expected_code: Expected error code returned by the API.expected_message: Expected error message string.
Returns: None. Uses assertions to validate API responses.
Usage Example:
auth = RAGFlowHttpApiAuth(INVALID_API_TOKEN) response = retrieval_chunks(auth) assert response["code"] == 109 assert "Authentication error" in response["message"]
TestChunksRetrieval
Comprehensive tests for chunk retrieval functionality, including parameter validation and concurrency.
Common Parameters in Tests
HttpApiAuth: Valid authorization credentials fixture.add_chunks: Fixture that adds chunks to the system and returns identifiers(dataset_id, document_id, ...).payload: Dictionary containing query parameters for chunk retrieval.expected_code: Expected API response code.expected_page_size: Expected number of chunks returned.expected_message: Expected error or status message.
Methods
test_basic_scenarios(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)Tests basic parameter combinations related to
dataset_idsanddocument_ids.test_page(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)Tests pagination parameters
pageandpage_size, including invalid and edge cases. Some tests are skipped due to known issues.test_page_size(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)Tests different values of
page_sizeparameter, including string inputs and invalid values.test_vector_similarity_weight(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)Tests the
vector_similarity_weightparameter influencing retrieval ranking, including invalid type handling.test_top_k(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)Tests the
top_kparameter, which controls the number of top chunks returned. Tests include handling of negative values, strings, and environment-specific skips.test_rerank_id(self, HttpApiAuth, add_chunks, payload, expected_code, expected_message)(Skipped) Tests the
rerank_idparameter for reranking models. Includes test for unknown rerank IDs.test_keyword(self, HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)(Skipped) Tests the
keywordboolean/string parameter controlling keyword search behavior.test_highlight(self, HttpApiAuth, add_chunks, payload, expected_code, expected_highlight, expected_message)Tests the
highlightparameter that controls whether chunks include highlighted search terms.test_invalid_params(self, HttpApiAuth, add_chunks)Tests API behavior with unexpected parameters, expecting graceful handling.
test_concurrent_retrieval(self, HttpApiAuth, add_chunks)Tests concurrent execution of retrieval requests (100 parallel requests) to validate thread-safety and consistency.
Important Implementation Details and Algorithms
Parameterized Testing: Most tests use
pytest.mark.parametrizeto run the same test logic with multiple input/output pairs, improving coverage and reducing code duplication.Conditional Test Skips: Some test cases are skipped based on known issues (
issues/6646,issues/6648) or environment variables (DOC_ENGINE), allowing flexible testing across different deployment contexts.Concurrent Requests:
test_concurrent_retrievalusesThreadPoolExecutorto simulate multiple clients querying the retrieval API simultaneously, ensuring correct behavior under load.Error Handling Validation: Tests extensively check for correct error codes and messages when invalid inputs are provided, ensuring robustness.
Dynamic Payload Mutation: Tests dynamically update payload dictionaries with IDs obtained from test fixtures to ensure realistic and valid test inputs.
Interaction with Other Parts of the System
retrieval_chunksfunction (fromcommon): This is the core API function under test, responsible for fetching text chunks based on provided parameters.Authorization (
RAGFlowHttpApiAuth): The tests interact with the authentication subsystem by providing valid or invalid API tokens.Configurations (
configs): Use of constants likeINVALID_API_TOKENto simulate authentication failures.Test Fixtures (
HttpApiAuth,add_chunks): External fixtures provide setup data such as authorized credentials and pre-added chunks in datasets/documents.Environment Variables (
os.getenv("DOC_ENGINE")): Influence test behavior based on the configured document engine backend.
Usage Example
A typical test case flow in this file:
Prepare a payload dictionary with parameters such as
"question","dataset_ids","page","page_size", etc.Obtain valid API authorization from a fixture.
Call
retrieval_chunkswith the authorization and payload.Assert that the response's
code,message, and returned chunk count match expected values.
def test_example(HttpApiAuth, add_chunks):
dataset_id, _, _ = add_chunks
payload = {"question": "example", "dataset_ids": [dataset_id], "page_size": 3}
res = retrieval_chunks(HttpApiAuth, payload)
assert res["code"] == 0
assert len(res["data"]["chunks"]) == 3
Mermaid Diagram: Class Structure
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestChunksRetrieval {
+test_basic_scenarios(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_page(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_page_size(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_vector_similarity_weight(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_top_k(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_rerank_id(HttpApiAuth, add_chunks, payload, expected_code, expected_message)
+test_keyword(HttpApiAuth, add_chunks, payload, expected_code, expected_page_size, expected_message)
+test_highlight(HttpApiAuth, add_chunks, payload, expected_code, expected_highlight, expected_message)
+test_invalid_params(HttpApiAuth, add_chunks)
+test_concurrent_retrieval(HttpApiAuth, add_chunks)
}
Summary
This file is a comprehensive test suite for the chunk retrieval API in InfiniFlow.
It validates authorization, parameter correctness, pagination, filtering, ranking, highlighting, and concurrency.
Uses
pytestfeatures like parameterization and fixtures for modular, maintainable tests.Skips some tests conditionally to accommodate known issues and environment differences.
Includes concurrency testing to ensure thread-safe and performant retrieval operations.