test_dataset.py
Overview
The test_dataset.py file contains a suite of automated test functions designed to validate the behavior and robustness of dataset management operations within the InfiniFlow system. It tests dataset creation, listing, deletion, updates, and handling of edge cases such as duplicated or invalid dataset names.
The tests primarily interact with dataset-related API functions imported from the common module:
create_datasetlist_datasetrm_datasetupdate_dataset
These tests ensure the dataset management layer behaves as expected under normal and boundary conditions, validating both success and failure scenarios.
Detailed Descriptions of Functions
1. test_dataset(get_auth)
Purpose:
Test basic dataset creation, listing, and deletion functionality.
Parameters:
get_auth: Authentication token or credentials required by the dataset API functions.
Workflow:
Creates a dataset named
"test_create_dataset".Lists all datasets page-by-page (each page contains up to 150 datasets).
Collects all dataset IDs.
Deletes all datasets found in the listing.
Assertions:
Creation response code is
0(success).Deletion responses code is
0for every dataset.
Usage example:
test_dataset(get_auth_token)
2. test_dataset_1k_dataset(get_auth)
Purpose:
Stress test dataset creation, listing, and deletion by creating 1000 datasets.
Parameters:
get_auth: Authentication credentials.
Workflow:
Creates 1000 datasets named
"test_create_dataset_0"through"test_create_dataset_999".Lists all datasets with pagination.
Deletes all datasets found.
Assertions:
Each creation returns success code
0.All deletions return success code
0.
Usage example:
test_dataset_1k_dataset(get_auth_token)
3. test_duplicated_name_dataset(get_auth)
Purpose:
Test the system's handling of multiple datasets with the same base name.
Parameters:
get_auth: Authentication credentials.
Workflow:
Creates 20 datasets all named
"test_create_dataset".Lists first page of datasets.
Validates that listed datasets' names match the pattern
^test_create_dataset.*.Deletes all matching datasets.
Assertions:
All creations successful (
code == 0).All listed datasets match the naming pattern.
All deletions successful.
Usage example:
test_duplicated_name_dataset(get_auth_token)
4. test_invalid_name_dataset(get_auth)
Purpose:
Verify the system rejects invalid dataset names.
Parameters:
get_auth: Authentication credentials.
Tests performed:
Attempts to create a dataset with a non-string name (
0).Attempts to create a dataset with an empty string.
Attempts to create a dataset with a very long string exceeding
DATASET_NAME_LIMIT.
Expected result:
All attempts fail with error code 102.
Usage example:
test_invalid_name_dataset(get_auth_token)
5. test_update_different_params_dataset_success(get_auth)
Purpose:
Test successful update of a dataset with various parameters.
Parameters:
get_auth: Authentication credentials.
Workflow:
Creates a dataset.
Lists datasets and selects the first dataset ID.
Sends an update request with multiple parameters:
kb_id: ID of the dataset.name,description,permission,parser_id,language.
Deletes all datasets after test.
Assertions:
Creation, update, and deletion return success code
0.
Usage example:
test_update_different_params_dataset_success(get_auth_token)
6. test_update_different_params_dataset_fail(get_auth)
Purpose:
Test failure case when updating a dataset with invalid parameters.
Parameters:
get_auth: Authentication credentials.
Workflow:
Creates a dataset.
Lists datasets and selects the first dataset ID.
Attempts to update dataset with invalid parameters (
idkey with invalid value).Deletes all datasets after test.
Assertions:
Update returns failure code
101.Creation and deletion succeed.
Usage example:
test_update_different_params_dataset_fail(get_auth_token)
Important Implementation Details
Pagination Handling:
The tests for listing datasets handle pagination by looping page numbers and fetching datasets until fewer than 150 datasets are returned, indicating the last page.Error Code Checking:
All API calls return a dictionary with a"code"key indicating success (0) or failure (non-zero). Tests assert expected codes to verify operation status.Regular Expression Matching:
The test for duplicated dataset names uses regex to verify dataset names conform to a pattern.Random String Generation:
For testing invalid names, random alphanumeric strings are generated until exceeding the allowed dataset name length limit.
Interaction with Other Parts of the System
This file depends on the
commonmodule for core dataset management functions:create_dataset,list_dataset,rm_dataset,update_dataset.It assumes an authentication mechanism is provided via the
get_authparameter, which supplies credentials or tokens to API functions.The dataset management API is expected to support pagination, dataset creation with name constraints, update operations, and error codes.
These tests would typically be run as part of a continuous integration process to ensure dataset API stability.
Visual Diagram: Structure of test_dataset.py
classDiagram
class test_dataset {
+test_dataset(get_auth)
+test_dataset_1k_dataset(get_auth)
+test_duplicated_name_dataset(get_auth)
+test_invalid_name_dataset(get_auth)
+test_update_different_params_dataset_success(get_auth)
+test_update_different_params_dataset_fail(get_auth)
}
%% Utility functions imported from common
class common {
+create_dataset(auth, name)
+list_dataset(auth, page_number)
+rm_dataset(auth, dataset_id)
+update_dataset(auth, json_req)
+DATASET_NAME_LIMIT
}
test_dataset ..> common : uses
Summary
test_dataset.py is a focused test suite file validating the core dataset management operations in the InfiniFlow platform. It systematically tests creation, listing, deletion, updates, and error handling, ensuring the dataset API behaves correctly under various scenarios. The file relies on the common module for actual API calls and requires authentication credentials injected through the get_auth parameter. Its thorough tests and assertions help maintain the stability and correctness of the dataset subsystem.