test_delete_chunks.py
Overview
The test_delete_chunks.py file contains a suite of automated tests designed to validate the functionality and robustness of the chunk deletion API in the InfiniFlow system. It primarily focuses on testing different scenarios related to deleting data chunks associated with documents within datasets, emphasizing authorization, input validation, concurrency, and boundary cases.
This file uses the pytest framework for structuring tests and parametrization and integrates with utility functions like delete_chunks, list_chunks, and batch_add_chunks from other modules for setup and verification. It also leverages concurrency testing using Python’s ThreadPoolExecutor.
Contents and Structure
The file contains:
TestAuthorization: A class dedicated to testing authorization-related error handling in chunk deletion.
TestChunksDeletion: A comprehensive test class covering multiple scenarios for chunk deletion, including invalid inputs, partial deletes, concurrency, large-scale deletes, and basic functionality checks.
Classes and Methods
1. TestAuthorization
Tests how the deletion API handles invalid or missing authorization tokens.
Methods
test_invalid_auth(invalid_auth, expected_code, expected_message)Parameters:
invalid_auth: An authorization object orNonerepresenting invalid authorization credentials.expected_code: The expected error code returned by the API.expected_message: The expected error message.
Returns: None. This is a test assertion method.
Description: Calls
delete_chunkswith invalid or missing authorization and asserts that the response contains the expected error code and message.Usage Example:
TestAuthorization().test_invalid_auth(None, 0, "`Authorization` can't be empty")
2. TestChunksDeletion
Extensive tests for the chunk deletion functionality, including error cases, concurrency, and large data sets.
Methods
test_invalid_dataset_id(HttpApiAuth, add_chunks_func, dataset_id, expected_code, expected_message)Tests deletion with invalid dataset IDs.
Parameters:
HttpApiAuth: Valid authorization object.add_chunks_func: A fixture providing a tuple of(dataset_id, document_id, chunk_ids)with added chunks.dataset_id: Dataset ID being tested.expected_code: Expected error code.expected_message: Expected error message.
Behavior: Attempts to delete chunks with invalid dataset IDs and asserts error responses.
test_invalid_document_id(HttpApiAuth, add_chunks_func, document_id, expected_code, expected_message)Tests deletion with invalid document IDs.
test_delete_partial_invalid_id(HttpApiAuth, add_chunks_func, payload)Tests deletion where the chunk ID list contains invalid IDs intermixed with valid ones.
Behavior: Expects partial success and verifies the number of chunks remaining after deletion.
test_repeated_deletion(HttpApiAuth, add_chunks_func)Tests deleting the same chunks twice; expects success on first deletion and an error indicating zero chunks deleted on second attempt.
test_duplicate_deletion(HttpApiAuth, add_chunks_func)Tests deletion payload with duplicate chunk IDs; expects partial success and error messages about duplicates.
test_concurrent_deletion(HttpApiAuth, add_document)Tests concurrent deletion of chunks using multithreading.
Parameters:
HttpApiAuth: Valid authorization.add_document: Fixture providing(dataset_id, document_id)with a document ready for chunk operations.
Behavior: Adds 100 chunks, deletes them concurrently via multiple threads, and asserts all deletions succeed.
test_delete_1k(HttpApiAuth, add_document)Tests deletion of a large number of chunks (1,000).
Behavior: Adds 1,000 chunks, deletes them all, and verifies no chunks remain. Includes a small delay to mitigate timing issues (referencing issue 6487).
test_basic_scenarios(HttpApiAuth, add_chunks_func, payload, expected_code, expected_message, remaining)Covers various common test cases with different payloads, including empty, invalid, partial, and full deletes.
Parameters:
payload: The chunk IDs payload to send for deletion, sometimes a lambda generating payloads dynamically.expected_code: Expected API response code.expected_message: Expected error message if any.remaining: Expected number of chunks remaining after the operation.
Behavior: Runs deletion requests and asserts API response and chunk counts.
Important Implementation Details
Use of Fixtures and Parametrization:
Tests rely on common fixtures like
HttpApiAuth,add_chunks_func, andadd_documentto provide authenticated access and prepared datasets/documents/chunks, promoting reuse and isolation.Concurrency Testing:
The
test_concurrent_deletionusesThreadPoolExecutorto simulate multiple simultaneous deletion requests on individual chunk IDs to ensure thread safety and consistency of the deletion API.Partial and Duplicate ID Handling:
The API’s behavior when chunk IDs contain invalid or duplicate entries is thoroughly tested, verifying that the system deletes what it can and returns appropriate warnings or errors.
Large Scale Testing:
The
test_delete_1kmethod assesses performance and correctness when dealing with bulk operations, an important real-world scenario.Error Handling:
Tests cover multiple HTTP errors (404 Not Found, 405 Method Not Allowed), authentication errors, and internal errors, ensuring the API's robustness.
Interaction with Other System Components
delete_chunksFunction:The core function under test, imported from
common. It executes the chunk deletion API call.list_chunksFunction:Used to verify the chunks that remain after deletion operations.
batch_add_chunksFunction:Used in setup phases to populate datasets/documents with chunks for testing deletion.
RAGFlowHttpApiAuthClass:Used to simulate authentication tokens, including invalid tokens for testing authorization errors.
Test Framework:
Uses
pytestfor test structure, parametrization, and marking tests by priority (p1,p3).Concurrency Utilities:
Uses Python's
concurrent.futures.ThreadPoolExecutorto simulate concurrent chunk deletion requests.
Visual Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestChunksDeletion {
+test_invalid_dataset_id(HttpApiAuth, add_chunks_func, dataset_id, expected_code, expected_message)
+test_invalid_document_id(HttpApiAuth, add_chunks_func, document_id, expected_code, expected_message)
+test_delete_partial_invalid_id(HttpApiAuth, add_chunks_func, payload)
+test_repeated_deletion(HttpApiAuth, add_chunks_func)
+test_duplicate_deletion(HttpApiAuth, add_chunks_func)
+test_concurrent_deletion(HttpApiAuth, add_document)
+test_delete_1k(HttpApiAuth, add_document)
+test_basic_scenarios(HttpApiAuth, add_chunks_func, payload, expected_code, expected_message, remaining)
}
TestAuthorization ..> delete_chunks : calls
TestChunksDeletion ..> delete_chunks : calls
TestChunksDeletion ..> list_chunks : calls
TestChunksDeletion ..> batch_add_chunks : calls
Summary
Purpose: To rigorously test the chunk deletion API's correctness, error handling, concurrency, and performance.
Usage: Run with
pytestto validate changes or regressions in the chunk deletion functionality.Scope: Authorization checks, invalid inputs, partial deletes, duplicate IDs, concurrency, and large-scale deletes.
System Role: Ensures data integrity and API reliability for chunk deletion operations in the InfiniFlow platform.
If you have any questions about particular tests, fixtures, or how to extend these tests, please let me know!