test_delete_documents.py
Overview
test_delete_documents.py is a test suite designed to validate the functionality and robustness of the document deletion API endpoints within the InfiniFlow system. It leverages the pytest framework to organize and run a series of automated tests that check authorization, input validation, concurrency handling, and performance when deleting documents from datasets.
The tests primarily focus on the delete_documnets API (noting the consistent misspelling in the source), ensuring that it behaves correctly under various scenarios including invalid authorization, invalid input payloads, partial and full deletions, repeated or duplicate deletions, and high-volume deletion requests.
Detailed Description of Classes and Functions
Imported Modules and Dependencies
concurrent.futures.ThreadPoolExecutor: Used to test concurrent deletion requests.pytest: Testing framework used for parameterized tests and marking test priorities.commonmodule:INVALID_API_TOKEN: A constant representing an invalid token.bulk_upload_documents(): Utility to upload multiple documents for testing.delete_documnets(): API call to delete documents.list_documnets(): API call to list documents.
libs.auth.RAGFlowHttpApiAuth: Class for API authentication.
Class: TestAuthorization
Purpose:
Tests how the deletion API handles authorization failures.
Method: test_invalid_auth(auth, expected_code, expected_message)
Parameters:
auth: Authentication object orNone.expected_code: Expected response code from the API.expected_message: Expected error message from the API.
Functionality:
Callsdelete_documnetswith invalid or missing authorization and asserts the API returns the correct error code and message.Usage Example:
auth = RAGFlowHttpApiAuth(INVALID_API_TOKEN)
res = delete_documnets(auth, "dataset_id")
assert res["code"] == 109
assert res["message"] == "Authentication error: API key is invalid!"
Class: TestDocumentsDeletion
Purpose:
Contains comprehensive tests for document deletion with varying input payloads and dataset states.
Method: test_basic_scenarios(get_http_api_auth, add_documents_func, payload, expected_code, expected_message, remaining)
Parameters:
get_http_api_auth: Fixture providing valid API authentication.add_documents_func: Fixture adding documents and returning(dataset_id, document_ids).payload: Theidspayload for deletion; can be None, dict, string, or a callable returning a dict.expected_code: Expected API response code.expected_message: Expected API response message if error occurs.remaining: Number of documents expected to remain after deletion.
Functionality:
Tests how the API handles different payloads, including empty, invalid IDs, malformed JSON, and valid IDs. It verifies API response and that the remaining documents count matches expectations.
Method: test_invalid_dataset_id(get_http_api_auth, add_documents_func, dataset_id, expected_code, expected_message)
Parameters:
dataset_id: Dataset ID to test with (empty string or invalid ID).Others as above.
Functionality:
Attempts deletion on invalid datasets and verifies proper error handling.
Method: test_delete_partial_invalid_id(get_http_api_auth, add_documents_func, payload)
Parameters:
payload: Callable generating a list of IDs containing a mix of valid and invalid document IDs.
Functionality:
Tests deletion when the payload contains some invalid document IDs and ensures all valid docs are deleted while reporting errors for invalid ones.
Method: test_repeated_deletion(get_http_api_auth, add_documents_func)
Functionality:
Deletes documents once successfully and attempts to delete the same documents again, expecting a "Documents not found" error.
Method: test_duplicate_deletion(get_http_api_auth, add_documents_func)
Functionality:
Tests deletion when duplicate document IDs are supplied in the same request, ensuring duplicates are detected and error messages are returned, but valid deletions still proceed.
Function: test_concurrent_deletion(get_http_api_auth, add_dataset, tmp_path)
Purpose:
Tests the behavior of the deletion API under concurrent deletion requests.Parameters:
get_http_api_auth: Valid API authentication.add_dataset: Fixture to add a dataset.tmp_path: Temporary directory path fixture for bulk upload.
Functionality:
Uploads 100 documents, then concurrently issues 100 deletion requests (one per document) using a thread pool with 5 workers. Verifies all deletions succeed with code 0.Usage Example:
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(delete_documnets, auth, dataset_id, {"ids": [doc_id]}) for doc_id in document_ids]
responses = [f.result() for f in futures]
assert all(r["code"] == 0 for r in responses)
Function: test_delete_1k(get_http_api_auth, add_dataset, tmp_path)
Purpose:
Tests API performance and correctness when deleting a large batch (1000) of documents.Parameters:
Same as
test_concurrent_deletion.
Functionality:
Uploads 1,000 documents, verifies upload, deletes all in one request, and confirms all documents are deleted.
Important Implementation Details and Algorithms
Parameterized Testing:
Usespytest.mark.parametrizeextensively to test multiple input scenarios without duplicating code.Payload Flexibility:
Supports payloads as raw dicts,None, strings, or callables for dynamic generation of IDs, enabling flexible test case definitions.Error Handling Validation:
Checks API response codes and messages for invalid authentication, dataset access errors, malformed payloads, and invalid document IDs.Concurrency Testing:
Simulates concurrent deletions to uncover race conditions or data consistency issues.Duplicate ID Handling:
Tests the API’s ability to detect and report duplicate document IDs in deletion requests, while still performing deletions on valid entries.
Interaction with Other System Components
commonModule:
Provides utility functions for document listing, bulk uploading, and deletion, which this test suite invokes to drive tests.Authentication (
libs.auth.RAGFlowHttpApiAuth):
Used to simulate authenticated API calls with valid or invalid tokens.Dataset and Document Management System:
These tests interact with the system's datasets by adding documents and then attempting deletions, verifying the system's data integrity and access control.ThreadPoolExecutor:
Used to test concurrent access patterns against the deletion API.
Usage and Running Tests
The tests are designed to be run using
pytest.Tests are marked with priority levels (
p1,p2,p3) which can be used to selectively run tests.Fixtures such as
get_http_api_auth,add_documents_func, andadd_datasetare expected to be defined elsewhere in the test suite to provide necessary setup.
Example command to run all tests:
pytest test_delete_documents.py
Mermaid Class Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(auth, expected_code, expected_message)
}
class TestDocumentsDeletion {
+test_basic_scenarios(get_http_api_auth, add_documents_func, payload, expected_code, expected_message, remaining)
+test_invalid_dataset_id(get_http_api_auth, add_documents_func, dataset_id, expected_code, expected_message)
+test_delete_partial_invalid_id(get_http_api_auth, add_documents_func, payload)
+test_repeated_deletion(get_http_api_auth, add_documents_func)
+test_duplicate_deletion(get_http_api_auth, add_documents_func)
}
TestAuthorization ..> RAGFlowHttpApiAuth : uses
TestDocumentsDeletion ..> delete_documnets : calls
TestDocumentsDeletion ..> list_documnets : calls
%% Standalone test functions
class test_concurrent_deletion {
+test_concurrent_deletion(get_http_api_auth, add_dataset, tmp_path)
}
class test_delete_1k {
+test_delete_1k(get_http_api_auth, add_dataset, tmp_path)
}
test_concurrent_deletion ..> ThreadPoolExecutor : uses
test_concurrent_deletion ..> delete_documnets : calls
test_delete_1k ..> bulk_upload_documents : calls
test_delete_1k ..> delete_documnets : calls
Summary
test_delete_documents.py is a critical component of the InfiniFlow testing infrastructure, ensuring that document deletion operations are secure, reliable, and performant. It validates error handling, concurrency, and edge cases, thereby helping maintain the integrity of the document management subsystem.