test_list_chunks.py
Overview
test_list_chunks.py is a comprehensive test suite designed to validate the behavior and robustness of the list_chunks API function within the InfiniFlow system. This file uses the pytest framework to run a variety of test scenarios that cover:
Authorization validation and error handling.
Pagination parameters (
page,page_size) and their edge cases.Filtering functionality using keywords and chunk IDs.
Handling of invalid parameters, dataset IDs, and document IDs.
Concurrent access and thread safety of the
list_chunksendpoint.Interaction with chunk addition (
batch_add_chunks) to verify updates.
The tests ensure that the list_chunks function correctly handles valid and invalid inputs, maintains expected behavior under concurrent requests, and properly enforces permissions and ownership rules.
Classes and Test Cases
TestAuthorization
This class contains tests related to authorization and authentication for accessing the list_chunks API.
Method: test_invalid_auth
Purpose: Validates that the API rejects requests with invalid or missing authorization tokens.
Parameters:
invalid_auth: An invalid orNoneauthorization object.expected_code: Expected error code from the API response.expected_message: Expected error message describing the authorization failure.
Usage Example:
def test_invalid_auth(self, invalid_auth, expected_code, expected_message):
res = list_chunks(invalid_auth, "dataset_id", "document_id")
assert res["code"] == expected_code
assert res["message"] == expected_message
Behavior:
Tests with
Noneauthorization should return code0and message "Authorizationcan't be empty".Tests with an invalid API token should return code
109and an appropriate authentication error message.
TestChunksList
This class groups tests for the core functionality of listing chunks, including pagination, filtering, concurrency, and error scenarios.
Method: test_page
Purpose: Tests the handling of the
pageparameter in pagination.Parameters:
params: Dictionary containingpageandpage_size.expected_code: Expected API response code.expected_page_size: Expected number of chunks returned.expected_message: Expected message if any error occurs.
Details:
Valid pages should return chunks accordingly.
Negative pages and invalid strings are skipped due to known issues or unsupported behavior.
Method: test_page_size
Purpose: Tests different values for the
page_sizeparameter.Parameters: Same as
test_pagebut focusing onpage_size.Details:
Page sizes of 0 or greater than max (5) default to 5.
Invalid values like negative or non-numeric strings are skipped or expected to raise errors.
Method: test_keywords
Purpose: Checks filtering chunks by keywords.
Parameters:
params: Contains thekeywordsfilter.expected_page_size: Expected number of chunks matching the keyword filter.
Details:
Tests various keyword inputs including empty,
None, numeric strings, and known keywords.Some cases are skipped when running under specific document engines due to known issues.
Method: test_id
Purpose: Tests filtering by specific chunk ID.
Parameters:
chunk_id: The chunk ID to filter by; can beNone, empty, callable, or unknown string.expected_code: Expected API response code.expected_page_size: Expected number of chunks.expected_message: Expected error message if any.
Details:
Valid chunk IDs return exactly one chunk.
Unknown or invalid IDs return errors or empty results.
Method: test_invalid_params
Purpose: Verifies that unknown query parameters do not affect the successful response.
Details:
Passing an unknown parameter returns default chunk list without errors.
Method: test_concurrent_list
Purpose: Stress test for concurrency by sending 100 parallel requests to
list_chunks.Details:
Uses
ThreadPoolExecutorwith five workers.Verifies all responses return the expected number of chunks.
Method: test_default
Purpose: Validates default behavior of
list_chunksbefore and after batch adding chunks.Details:
Checks initial chunk count.
Adds 31 chunks using
batch_add_chunks.Waits 3 seconds (to allow processing).
Verifies that chunk count increased by 31 and only 30 chunks are returned by default pagination.
Method: test_invalid_dataset_id
Purpose: Tests API response for invalid or unauthorized dataset IDs.
Parameters:
dataset_id: Dataset identifier string.expected_code: Expected response code.expected_message: Expected error message.
Details:
Empty or invalid dataset IDs return appropriate errors.
Method: test_invalid_document_id
Purpose: Tests API response for invalid or unauthorized document IDs.
Parameters:
document_id: Document identifier string.expected_code: Expected response code.expected_message: Expected error message.
Details:
Empty or invalid document IDs return appropriate errors.
Important Implementation Details
Use of
pytest.mark.parametrize:
Many tests use parameterization to cover multiple input scenarios efficiently.Skipping Tests:
Some tests are skipped conditionally because of known issues or unsupported features, indicated withpytest.mark.skiporpytest.mark.skipif.Concurrent Testing:
test_concurrent_listuses Python'sThreadPoolExecutorto simulate multiple simultaneous requests to check thread safety and performance.Sleep for eventual consistency:
test_defaultintroduces asleep(3)delay after adding chunks to allow eventual consistency or background processing to complete before verification.Error Handling Validation:
Tests validate that error codes and messages match expected values for various invalid inputs, ensuring the API's robustness.
Interaction with Other Modules
Imports:
batch_add_chunksandlist_chunkscome from a common utilities module and represent core API operations tested here.RAGFlowHttpApiAuthfromlibs.authis used to create authorization tokens.INVALID_API_TOKENfromconfigsis used to test invalid authentication.
Fixtures and Helpers:
HttpApiAuth,add_chunks, andadd_documentare pytest fixtures assumed to provide valid authentication and pre-populated datasets/documents/chunks for testing.
API Under Test:
list_chunks(auth, dataset_id, document_id, params=None)is the main function under test, which returns a dictionary containing status codes, messages, and chunk data.
Usage Examples
Example usage of list_chunks in tests:
def test_page_size(self, HttpApiAuth, add_chunks):
dataset_id, document_id, _ = add_chunks
params = {"page_size": 2}
res = list_chunks(HttpApiAuth, dataset_id, document_id, params=params)
assert res["code"] == 0
assert len(res["data"]["chunks"]) == 2
Example of concurrent invocation:
def test_concurrent_list(self, HttpApiAuth, add_chunks):
dataset_id, document_id, _ = add_chunks
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(list_chunks, HttpApiAuth, dataset_id, document_id) for _ in range(100)]
responses = list(as_completed(futures))
assert len(responses) == 100
assert all(future.result()["code"] == 0 for future in futures)
Mermaid Diagram: Test Class Structure
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestChunksList {
+test_page(params, expected_code, expected_page_size, expected_message)
+test_page_size(params, expected_code, expected_page_size, expected_message)
+test_keywords(params, expected_page_size)
+test_id(chunk_id, expected_code, expected_page_size, expected_message)
+test_invalid_params()
+test_concurrent_list()
+test_default()
+test_invalid_dataset_id(dataset_id, expected_code, expected_message)
+test_invalid_document_id(document_id, expected_code, expected_message)
}
Summary
test_list_chunks.py serves as a critical quality assurance asset for the InfiniFlow platform, ensuring the chunk listing functionality behaves correctly across a wide array of scenarios, including authorization, pagination, filtering, concurrency, and error situations. It leverages pytest's advanced features like parameterization and fixtures to keep tests organized, efficient, and maintainable. The tests also provide early warnings for integration issues, such as known problems with specific document engines or invalid inputs.
This file interacts heavily with authentication modules and chunk management utilities, embodying an integration test layer between API endpoints and data services. It is essential for maintaining the reliability and security of the chunk listing feature in the InfiniFlow system.