test_delete_datasets.py
Overview
test_delete_datasets.py is a test suite designed to verify the functionality, robustness, and security of the dataset deletion feature in the InfiniFlow platform via the ragflow_sdk client. It uses the pytest framework for structuring and running the tests. This file primarily focuses on validating:
Authorization handling when deleting datasets with invalid credentials.
Capability of the system to handle large-scale deletions (e.g., 1000 datasets).
Correctness and edge cases of the dataset deletion API, including validation of UUID formats, handling of duplicate or invalid IDs, and concurrency scenarios.
The tests simulate real-world usage patterns, including concurrent deletion requests, partial invalid input, and error handling, ensuring the dataset deletion logic is robust and secure.
Detailed Breakdown
Imports and Dependencies
uuid: For UUID manipulation and validation.concurrent.futures.ThreadPoolExecutor,as_completed: For concurrent execution of deletion calls.pytest: Test framework for defining and running tests.common.batch_create_datasets: Utility function to create multiple datasets for testing.configs.HOST_ADDRESS,configs.INVALID_API_TOKEN: Configuration constants for the test environment.ragflow_sdk.RAGFlow: The SDK client used to interact with the InfiniFlow backend.
Classes and Their Functionality
1. TestAuthorization
This class tests the authorization mechanism related to dataset deletion.
Methods
test_auth_invalid(self, invalid_auth, expected_message)Parameters:
invalid_auth: An invalid API token orNone.expected_message: Expected error message string upon failure.
Functionality: Instantiates the client with invalid authentication and asserts that an exception with the expected message is raised when attempting to delete datasets.
Usage Example:
client = RAGFlow(None, HOST_ADDRESS) with pytest.raises(Exception) as excinfo: client.delete_datasets() assert str(excinfo.value) == "Authentication error: API key is invalid!"
2. TestCapability
This class tests the ability of the system to handle bulk and concurrent dataset deletions.
Methods
test_delete_dataset_1k(self, client)Parameters:
client(pytest fixture providing authenticated client)Functionality: Creates 1000 datasets and deletes them all at once. Verifies that no datasets remain afterward.
Usage Example:
datasets = batch_create_datasets(client, 1000) client.delete_datasets(ids=[d.id for d in datasets]) assert len(client.list_datasets()) == 0
test_concurrent_deletion(self, client)Parameters:
clientFunctionality: Creates 1000 datasets and deletes each dataset individually using concurrent threads (max 5 workers). Verifies all requests complete and datasets are deleted.
Concurrency Handling: Uses
ThreadPoolExecutorto simulate simultaneous delete requests, validating thread safety.Usage Example:
with ThreadPoolExecutor(max_workers=5) as executor: futures = [executor.submit(client.delete_datasets, ids=[dataset.id]) for dataset in datasets] results = list(as_completed(futures)) assert len(results) == 1000 assert len(client.list_datasets()) == 0
3. TestDatasetsDelete
This class contains comprehensive tests around dataset deletion input validation, edge cases, and error scenarios.
Methods
test_ids(self, client, add_datasets_func, func, remaining)Parameters:
client: Authenticated client fixture.add_datasets_func: Fixture that adds multiple datasets.func: Function that generates the payload for deletion (subset or all dataset IDs).remaining: Expected number of datasets remaining after deletion.
Functionality: Deletes datasets based on the
funcinput and verifies the number of datasets left.Tested scenarios: Single dataset deletion and multiple dataset deletion.
Usage Example:
payload = {"ids": [dataset.id for dataset in add_datasets_func][:1]} # delete one client.delete_datasets(**payload) assert len(client.list_datasets()) == 2
test_ids_empty(self, client)Tests behavior when deletion is called with an empty
idslist.Expected: No datasets are deleted.
test_ids_none(self, client)Tests behavior when
idsis explicitlyNone.Expected: All datasets are deleted.
test_id_not_uuid(self, client)Tests invalid UUID string format.
Expected: Exception raised with message about invalid UUID1 format, no datasets deleted.
test_id_not_uuid1(self, client)Tests UUID that is valid format but not UUID1 variant.
Expected: Exception about invalid UUID1 format.
test_id_wrong_uuid(self, client)Tests UUID that the user lacks permission to delete.
Expected: Exception regarding lack of permission, datasets remain.
test_ids_partial_invalid(self, client, add_datasets_func, func)Tests combinations where some IDs are valid and one is invalid.
Expected: Exception about permission, datasets remain unchanged.
test_ids_duplicate(self, client, add_datasets_func)Tests deletion with duplicated IDs in the payload.
Expected: Exception about duplicate IDs, datasets remain.
test_repeated_delete(self, client, add_datasets_func)Tests deleting already deleted datasets again.
Expected: Exception about lack of permission.
test_field_unsupported(self, client)Tests passing unsupported keyword arguments in deletion call.
Expected: Python
TypeErrorabout unexpected argument.
Important Implementation Details
UUID Validation: The tests check for UUID1 format specifically, indicating that the backend likely requires UUID1-format identifiers for datasets.
Permission Handling: Several tests verify that users cannot delete datasets they do not have permission for, ensuring access control is enforced.
Concurrency: Concurrent deletion tests ensure the backend can handle multiple simultaneous delete requests without data corruption or race conditions.
Error Handling: The suite ensures meaningful error messages are presented for invalid inputs, enhancing debuggability.
Use of Fixtures:
client,add_dataset_func, andadd_datasets_funcare pytest fixtures (presumed defined elsewhere) that set up authenticated clients and dataset states for testing.
Interaction With Other System Components
ragflow_sdk.RAGFlowClient: This is the primary interface used to interact with the InfiniFlow backend API. The file tests thedelete_datasetsandlist_datasetsmethods.common.batch_create_datasets: Utility for creating test datasets in bulk.Configuration (
configs): Provides host address and invalid token used for negative tests.pytest Fixtures: External fixtures provide prepared clients and dataset state. This implies integration with a larger test framework where datasets and clients are managed.
Usage Example Summary
# Example: Deleting a single dataset
datasets = batch_create_datasets(client, 3) # creates 3 datasets
dataset_id = datasets[0].id
client.delete_datasets(ids=[dataset_id])
remaining = client.list_datasets()
assert len(remaining) == 2
Mermaid Class Diagram
classDiagram
class TestAuthorization {
+test_auth_invalid(invalid_auth, expected_message)
}
class TestCapability {
+test_delete_dataset_1k(client)
+test_concurrent_deletion(client)
}
class TestDatasetsDelete {
+test_ids(client, add_datasets_func, func, remaining)
+test_ids_empty(client)
+test_ids_none(client)
+test_id_not_uuid(client)
+test_id_not_uuid1(client)
+test_id_wrong_uuid(client)
+test_ids_partial_invalid(client, add_datasets_func, func)
+test_ids_duplicate(client, add_datasets_func)
+test_repeated_delete(client, add_datasets_func)
+test_field_unsupported(client)
}
Summary
test_delete_datasets.py is a comprehensive test suite that ensures dataset deletion via the InfiniFlow SDK is secure, robust, and behaves correctly under various conditions, including invalid inputs, concurrency, and authorization failures. It plays a critical role in maintaining data integrity and access control in the InfiniFlow platform.