test_list_chunks.py
Overview
test_list_chunks.py is a comprehensive test suite designed to verify the functionality, robustness, and correctness of the list_chunks API endpoint in the InfiniFlow system. This API is responsible for retrieving document chunks based on various query parameters such as pagination, page size, keywords filtering, and authorization credentials.
The tests are implemented using the pytest framework and cover scenarios including:
Authorization validation with valid and invalid tokens.
Pagination behavior and edge cases.
Page size constraints and limits.
Keyword-based filtering of chunks.
Handling of invalid or unexpected parameters.
Concurrent access to the
list_chunksAPI.Default behavior and consistency after batch chunk additions.
By exercising these cases, the test suite ensures that the chunk listing functionality behaves as expected under normal, boundary, and erroneous conditions.
Detailed Explanation
Imports and Dependencies
os: Used for environment variable checks.concurrent.futures.ThreadPoolExecutor,as_completed: Facilitate concurrent execution of API calls.pytest: Testing framework for structuring and running tests.batch_add_chunksandlist_chunks(fromcommon): Utility functions to add chunks and list chunks respectively.INVALID_API_TOKEN(fromconfigs): A constant representing an invalid API token used for testing authorization failures.RAGFlowWebApiAuth(fromlibs.auth): Authentication class used to create API auth tokens.
Test Classes
1. TestAuthorization
This class validates the authorization mechanism of the list_chunks API.
Method:
test_invalid_authPurpose: Tests API response when called without valid authentication credentials.
Parameters (via
pytest.mark.parametrize):invalid_auth: EitherNone(no auth) or aRAGFlowWebApiAuthinstance with an invalid token.expected_code: Expected HTTP-like response code (401Unauthorized).expected_message: Expected message string indicating unauthorized access.
Test: Calls
list_chunkswith invalid auth and expects a 401 error.Usage Example:
res = list_chunks(None, {"doc_id": "document_id"}) assert res["code"] == 401 assert "<Unauthorized" in res["message"]
2. TestChunksList
This class contains multiple tests focusing on the behavior of the chunk listing API with respect to pagination, page size, keyword filtering, parameter validation, concurrency, and default chunk listing.
Pagination Tests
Method:
test_pagePurpose: Validates correct handling of the
pageparameter.Parameters:
params: Dict containingpageandsize.expected_code: Expected response code (0 for success, 100 for error).expected_page_size: Number of chunks expected in the response.expected_message: Expected error message if the code is not 0.
Notes: Some edge cases are marked to skip due to current limitations.
Behavior:
Checks response when
pageisNone, negative, zero, a string, or valid integers.Asserts the count of returned chunks matches expectations.
Example Usage:
payload = {"doc_id": doc_id, "page": 2, "size": 2} res = list_chunks(WebApiAuth, payload) assert res["code"] == 0 assert len(res["data"]["chunks"]) == 2
Page Size Tests
Method:
test_page_sizePurpose: Tests API response to various
sizeparameter values.Parameters & Behavior:
Handles
sizeasNone, zero, positive integers, strings, and negative values.Validates whether the number of chunks returned matches the expected page size.
Some invalid inputs are skipped in tests due to current implementation constraints.
Example:
payload = {"doc_id": doc_id, "size": 1} res = list_chunks(WebApiAuth, payload) assert len(res["data"]["chunks"]) == 1
Keyword Filtering Tests
Method:
test_keywordsPurpose: Checks filtering of chunks by keywords.
Parameters:
params: Dict with different keyword values (None, empty string, specific keywords).expected_page_size: Expected number of filtered chunks returned.
Notes: Some tests are conditionally skipped based on environment due to known issues.
Behavior:
Verifies that valid keywords filter the chunk list accordingly.
Returns all chunks if keyword is empty or
None.
Example:
payload = {"doc_id": doc_id, "keywords": "content"} res = list_chunks(WebApiAuth, payload) assert len(res["data"]["chunks"]) == 1
Invalid Parameters Test
Method:
test_invalid_paramsPurpose: Ensures that unexpected parameters do not break functionality.
Behavior: Sends a payload with an unknown parameter
"a": "b"and asserts normal operation with default chunk count.Example:
payload = {"doc_id": doc_id, "a": "b"} res = list_chunks(WebApiAuth, payload) assert res["code"] == 0
Concurrent Requests Test
Method:
test_concurrent_listPurpose: Checks the API robustness under concurrent access.
Behavior:
Launches 100 parallel requests with up to 5 worker threads.
Asserts all responses return the expected number of chunks.
Example:
with ThreadPoolExecutor(max_workers=5) as executor: futures = [executor.submit(list_chunks, WebApiAuth, {"doc_id": doc_id}) for _ in range(100)] for future in as_completed(futures): assert len(future.result()["data"]["chunks"]) == 5
Default Behavior Test
Method:
test_defaultPurpose: Validates default chunk listing behavior and after batch chunk additions.
Behavior:
Lists chunks for a document.
Adds 31 chunks in batch.
Waits 3 seconds (to allow eventual consistency).
Lists chunks again and verifies chunk count increased accordingly.
Example:
res = list_chunks(WebApiAuth, {"doc_id": doc_id}) batch_add_chunks(WebApiAuth, doc_id, 31) time.sleep(3) res = list_chunks(WebApiAuth, {"doc_id": doc_id}) assert len(res["data"]["chunks"]) == 30
Important Implementation Details
Use of Parameterization: The tests use
pytest.mark.parametrizeextensively to cover multiple input variations efficiently.Skipping Tests: Some tests with invalid inputs or edge cases are marked to skip, indicating current known limitations or external dependencies.
Concurrency Testing: Employs
ThreadPoolExecutorto simulate high-load concurrent requests, ensuring thread-safety and API scalability.Sleep for Eventual Consistency: The default behavior test incorporates a delay to accommodate asynchronous chunk addition propagation.
Error Handling: Tests verify that the API returns appropriate error codes and messages for invalid inputs, enhancing robustness.
Interaction With Other System Components
list_chunksFunction: Core API call under test, responsible for retrieving document chunks.batch_add_chunksFunction: Utility to add multiple chunks to a document, used to setup test data.Authentication (
RAGFlowWebApiAuth): Used to simulate authorized and unauthorized API requests.Configuration (
INVALID_API_TOKEN): Used to test invalid authorization flows.Environment Variables: Some tests conditionally skip parts based on the environment (e.g.,
DOC_ENGINE).
This file acts as a validation layer ensuring that the document chunk listing API behaves correctly within the larger InfiniFlow system, especially relating to document chunk management and authentication.
Visual Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestChunksList {
+test_page(WebApiAuth, add_chunks, params, expected_code, expected_page_size, expected_message)
+test_page_size(WebApiAuth, add_chunks, params, expected_code, expected_page_size, expected_message)
+test_keywords(WebApiAuth, add_chunks, params, expected_page_size)
+test_invalid_params(WebApiAuth, add_chunks)
+test_concurrent_list(WebApiAuth, add_chunks)
+test_default(WebApiAuth, add_document)
}
TestAuthorization ..> list_chunks : calls
TestChunksList ..> list_chunks : calls
TestChunksList ..> batch_add_chunks : calls
TestAuthorization ..> RAGFlowWebApiAuth : uses
TestChunksList ..> RAGFlowWebApiAuth : uses
Summary
This test module test_list_chunks.py provides rigorous testing for the chunk listing API in InfiniFlow, covering authorization, pagination, filtering, concurrency, and error cases. It ensures the API returns correct data slices, handles invalid inputs gracefully, and maintains consistent behavior under load. The use of parameterized tests and concurrency simulations strengthens confidence in the system's chunk retrieval mechanisms.