test_delete_chunks.py
Overview
The test_delete_chunks.py file contains a comprehensive suite of automated test cases designed to verify the correctness, robustness, and security of the chunk deletion functionality in the InfiniFlow system. Specifically, it tests the delete_chunks API endpoint, which handles the removal of data chunks from documents within datasets.
This file uses the pytest testing framework and covers scenarios including authorization checks, validation of dataset and document IDs, handling of invalid or duplicate chunk IDs, concurrency in deletion operations, and performance on large-scale chunk deletion. It also validates the system's response messages and codes to ensure they conform to expected error handling and success criteria.
Detailed Explanation
Imports
ThreadPoolExecutorfromconcurrent.futures: Used to test concurrent chunk deletions.pytest: The test framework used to define and run the tests.Functions from
common:INVALID_API_TOKEN: An invalid token for testing authentication failure.batch_add_chunks: Utility to add multiple chunks efficiently.delete_chunks: The core API call to delete chunks (subject under test).list_chunks: API call to list chunks, used for validation after deletion.
RAGFlowHttpApiAuthfromlibs.auth: Handles API authentication.
Class: TestAuthorization
Tests authorization failures for the delete_chunks API.
Method: test_invalid_auth
Parameters (via
pytest.mark.parametrize):auth: The authorization method orNone.expected_code: Expected response code from the API.expected_message: Expected error message string.
Functionality:
Callsdelete_chunkswith invalid or missing authorization, verifies that the API returns the correct error code and message indicating authentication failure.Example Usage:
test = TestAuthorization()
test.test_invalid_auth(None, 0, "`Authorization` can't be empty")
Class: TestChunksDeletion
Contains multiple test cases that validate different aspects of chunk deletion.
Method: test_invalid_dataset_id
Parameters (via
pytest.mark.parametrize):dataset_id: Dataset ID to test (empty or invalid).expected_code: Expected error code.expected_message: Expected error message.
Details:
Tests deletion with invalid dataset IDs, ensuring the API returns appropriate error codes and messages such as "404 Not Found" or ownership errors.
Method: test_invalid_document_id
Parameters (via
pytest.mark.parametrize):document_id: Document ID to test (empty or invalid).expected_code: Expected error code.expected_message: Expected error message.
Details:
Tests deletion with invalid document IDs, checking for method not allowed or document not found errors.
Method: test_delete_partial_invalid_id
Parameters (via
pytest.mark.parametrize):payload: Different payloads mixing valid and invalid chunk IDs.
Details:
Tests partial deletion where some chunk IDs are invalid. Checks that the system deletes valid chunks and reports the discrepancy between expected and actual deletions.
Method: test_repeated_deletion
Details:
Tests the behavior when attempting to delete chunks that have already been deleted. Expects proper error reporting for zero chunks deleted on the second attempt.
Method: test_duplicate_deletion
Details:
Tests deletion requests containing duplicate chunk IDs. Confirms that duplicates are detected and the count of successfully deleted unique chunks is correct.
Method: test_concurrent_deletion
Details:
Tests concurrent deletion of chunks using a thread pool with multiple workers to simulate parallel requests. Verifies all deletions succeed without conflict.
Method: test_delete_1k
Details:
Tests deletion of a large number (1,000) of chunks to verify API scalability and correctness under load.
Method: test_basic_scenarios
Parameters (via
pytest.mark.parametrize):payload: Various payloads includingNone, invalid IDs, malformed data, and empty lists.expected_code: Expected response code.expected_message: Expected response message.remaining: Expected number of chunks remaining after deletion.
Details:
Covers a broad range of basic deletion scenarios, validating error handling, partial success, and empty deletion requests.
Important Implementation Details and Algorithms
Test Parameterization:
Many tests usepytest.mark.parametrizeto run the same test logic with different input parameters and expected outcomes. This enhances test coverage with minimal code duplication.Concurrent Deletion Testing:
Thetest_concurrent_deletionmethod uses Python'sThreadPoolExecutorto simulate multiple parallel deletion requests on individual chunk IDs, testing race conditions and concurrency safety in the backend.Chunk ID Validation:
Tests verify that the API correctly handles invalid, duplicate, and empty chunk ID lists, ensuring robust input validation.Error Code and Message Verification:
Each test asserts not only the success or failure of operations but also the exact error codes and messages returned by the API, ensuring consistent and informative feedback to users.Use of Fixtures (
get_http_api_auth,add_chunks_func,add_document):
The tests rely on pytest fixtures to provide authenticated sessions and pre-populated datasets/documents with chunks, promoting reusable and modular test setup.Sleep for Backend Consistency:
Intest_delete_1k, asleep(1)is used likely to account for eventual consistency or propagation delays before verifying deletions, indicating asynchronous backend processing.
Interaction with Other System Components
common.py:
Provides utility functions (batch_add_chunks,delete_chunks,list_chunks) and constants (INVALID_API_TOKEN) used to interact with the chunk APIs.libs.auth:
Supplies theRAGFlowHttpApiAuthclass for API key management and authentication.Backend Chunk Management API:
The tests exercise the chunk deletion API endpoints of the InfiniFlow backend, validating their behavior under various conditions.Dataset and Document Management:
Chunks are linked to documents and datasets, so the tests implicitly depend on dataset and document creation and ownership validation.
Usage Examples
Example: Test invalid authorization
auth = RAGFlowHttpApiAuth(INVALID_API_TOKEN)
response = delete_chunks(auth, "dataset_id", "document_id")
assert response["code"] == 109
assert "API key is invalid" in response["message"]
Example: Test deleting chunks with some invalid IDs
valid_chunk_ids = ["chunk1", "chunk2", "chunk3", "chunk4", "chunk5"]
payload = {"chunk_ids": ["invalid_id"] + valid_chunk_ids}
response = delete_chunks(auth, "dataset_id", "document_id", payload)
assert response["code"] == 102
assert "deleted chunks 4, expect 5" in response["message"]
Mermaid Diagram: Class and Test Structure
classDiagram
class TestAuthorization {
+test_invalid_auth(auth, expected_code, expected_message)
}
class TestChunksDeletion {
+test_invalid_dataset_id(get_http_api_auth, add_chunks_func, dataset_id, expected_code, expected_message)
+test_invalid_document_id(get_http_api_auth, add_chunks_func, document_id, expected_code, expected_message)
+test_delete_partial_invalid_id(get_http_api_auth, add_chunks_func, payload)
+test_repeated_deletion(get_http_api_auth, add_chunks_func)
+test_duplicate_deletion(get_http_api_auth, add_chunks_func)
+test_concurrent_deletion(get_http_api_auth, add_document)
+test_delete_1k(get_http_api_auth, add_document)
+test_basic_scenarios(get_http_api_auth, add_chunks_func, payload, expected_code, expected_message, remaining)
}
Summary
test_delete_chunks.py validates the chunk deletion API for correctness, security, and concurrency.
It includes authorization tests, input validation, concurrency stress tests, and large-scale deletion tests.
Uses pytest with parameterized tests and fixtures for reusable setup.
Interacts closely with authentication utilities and chunk management APIs.
Provides robust verification of error handling and response messaging.
This test suite is essential for maintaining the integrity and reliability of the chunk deletion feature within the InfiniFlow system, ensuring that data management behaves as expected under various edge cases and load conditions.