test_create_dataset.py
Overview
The test_create_dataset.py file is a comprehensive test suite for validating the dataset creation functionality of the InfiniFlow platform’s HTTP API. It primarily uses the pytest framework alongside property-based testing with hypothesis to verify correct behavior, boundary conditions, error handling, and concurrency aspects of the dataset creation endpoint.
This file covers various scenarios such as:
Authorization validation (valid/invalid tokens)
HTTP request validation (content type, payload format)
Dataset creation with valid and invalid dataset names
Handling of optional fields like avatar images, descriptions, embedding models, permissions, chunk methods, and parser configurations
Concurrency and stress testing (creating many datasets concurrently)
Validation of parser configuration bug fixes and default values
The tests ensure that the API responds correctly with appropriate status codes and messages, and that dataset objects are created or rejected as expected.
Detailed Explanation of Components
Imports
pytest: Testing framework.
hypothesis: For property-based testing (@given,@example,@settings).ThreadPoolExecutor,as_completed: For concurrent test execution.create_dataset: The function under test, which calls the API to create datasets.DATASET_NAME_LIMIT,INVALID_API_TOKEN: Configuration constants.RAGFlowHttpApiAuth: Authentication helper class.Utility functions like
encode_avatar,create_image_file, andvalid_namesfor test data preparation.
Test Classes and Their Methods
All test classes are decorated with @pytest.mark.usefixtures("clear_datasets") to ensure a clean state before each test by clearing existing datasets.
1. TestAuthorization
Purpose
Tests related to authorization during dataset creation.
Methods
test_auth_invalid(invalid_auth, expected_code, expected_message)Parametrized test checking API behavior when authorization is missing or invalid.
Parameters:
invalid_auth:Noneor an invalidRAGFlowHttpApiAuthobject.expected_code: Expected API response code (0 for success, 109 for auth error).expected_message: Expected error message string.
Returns: None (asserts inside the test)
Usage Example:
test_auth_invalid(None, 0, "`Authorization` can't be empty") test_auth_invalid(RAGFlowHttpApiAuth(INVALID_API_TOKEN), 109, "Authentication error: API key is invalid!")
2. TestRquest (Note: Class name likely a typo for "TestRequest")
Purpose
Tests HTTP request validation such as content type and payload format.
Methods
test_content_type_bad(HttpApiAuth)Tests that the API rejects unsupported content types.
test_payload_bad(HttpApiAuth, payload, expected_message)Parametrized test to check malformed JSON and invalid payload types.
3. TestCapability
Purpose
Tests the scalability and concurrency capabilities of dataset creation.
Methods
test_create_dataset_1k(HttpApiAuth)Creates 1000 datasets sequentially to test bulk creation capability.
test_create_dataset_concurrent(HttpApiAuth)Creates 100 datasets concurrently using a thread pool to test thread safety and concurrency.
4. TestDatasetCreate
Purpose
Extensive tests validating dataset creation fields, including boundary cases and invalid inputs.
Key Fields Tested:
Name: Using valid names, invalid names (empty, too long, non-string), duplicate names, case-insensitivity.
Avatar: Valid base64 encoded images, invalid prefixes, exceeding length limits, null and unset avatars.
Description: Normal text, exceeding length, null and unset descriptions.
Embedding Model: Valid identifiers, invalid models, missing or malformed format.
Permission: Valid permissions (
"me","team"), invalid values, unset and null handling.Chunk Method: Valid chunk methods (e.g., "naive", "book"), invalid values, unset and null handling.
Parser Config: Detailed testing of all supported parser config fields with valid and invalid values, including nested fields like
raptorandgraphrag.Unsupported Fields: Ensures extra, unknown fields are rejected.
Example for Name Validation:
@given(name=valid_names())
@example("a" * 128)
def test_name(self, HttpApiAuth, name):
res = create_dataset(HttpApiAuth, {"name": name})
assert res["code"] == 0
assert res["data"]["name"] == name
5. TestParserConfigBugFix
Purpose
Tests bug fixes and default values for the nested parser configuration fields, especially around the presence and defaulting of raptor and graphrag subfields.
Methods
Tests that missing
raptorandgraphragfields are automatically added with default values.Tests behavior when only one of these fields is present.
Tests behavior when both fields are present.
Tests with various chunk methods to ensure these defaults persist.
Important Implementation Details and Algorithms
Concurrency Testing: Uses
ThreadPoolExecutorto simulate multiple dataset creations in parallel, verifying that all succeed without race conditions.Property-Based Testing: Utilizes
hypothesisto generate a variety of input names and configurations, improving test coverage beyond static examples.Validation Logic Coverage: Tests cover detailed validation rules (e.g., length limits, type checks, format patterns) for dataset fields.
Default Value Enforcement: Tests verify that default settings are applied for unset or null optional fields, ensuring API robustness.
Error Message Verification: Each invalid input test asserts that the API returns precise and informative error messages, which is critical for client debugging.
Interaction with Other Parts of the System
create_datasetfunction: The central utility function incommonmodule responsible for making HTTP calls to the dataset creation API.Authentication (
libs.auth.RAGFlowHttpApiAuth): Handles API token management for authorized requests.Configuration Constants (
configs): Provide limits and invalid token constants used during tests.Utilities (
utils,utils.file_utils,utils.hypothesis_utils): For preparing test data (e.g., generating valid names, encoding avatars, creating image files).Dataset Management: Tests depend on a fixture
clear_datasetsto reset the dataset state before each run, ensuring test isolation.
This test file ensures the dataset creation API conforms to expected interface contracts and handles edge cases gracefully, thereby supporting the reliability of the broader InfiniFlow platform.
Visual Diagram
The following Mermaid class diagram depicts the structure of test classes and their main methods in this file:
classDiagram
class TestAuthorization {
+test_auth_invalid(invalid_auth, expected_code, expected_message)
}
class TestRquest {
+test_content_type_bad(HttpApiAuth)
+test_payload_bad(HttpApiAuth, payload, expected_message)
}
class TestCapability {
+test_create_dataset_1k(HttpApiAuth)
+test_create_dataset_concurrent(HttpApiAuth)
}
class TestDatasetCreate {
+test_name(HttpApiAuth, name)
+test_name_invalid(HttpApiAuth, name, expected_message)
+test_name_duplicated(HttpApiAuth)
+test_name_case_insensitive(HttpApiAuth)
+test_avatar(HttpApiAuth, tmp_path)
+test_avatar_exceeds_limit_length(HttpApiAuth)
+test_avatar_invalid_prefix(HttpApiAuth, tmp_path, name, prefix, expected_message)
+test_avatar_unset(HttpApiAuth)
+test_avatar_none(HttpApiAuth)
+test_description(HttpApiAuth)
+test_description_exceeds_limit_length(HttpApiAuth)
+test_description_unset(HttpApiAuth)
+test_description_none(HttpApiAuth)
+test_embedding_model(HttpApiAuth, name, embedding_model)
+test_embedding_model_invalid(HttpApiAuth, name, embedding_model)
+test_embedding_model_format(HttpApiAuth, name, embedding_model)
+test_embedding_model_unset(HttpApiAuth)
+test_embedding_model_none(HttpApiAuth)
+test_permission(HttpApiAuth, name, permission)
+test_permission_invalid(HttpApiAuth, name, permission)
+test_permission_unset(HttpApiAuth)
+test_permission_none(HttpApiAuth)
+test_chunk_method(HttpApiAuth, name, chunk_method)
+test_chunk_method_invalid(HttpApiAuth, name, chunk_method)
+test_chunk_method_unset(HttpApiAuth)
+test_chunk_method_none(HttpApiAuth)
+test_parser_config(HttpApiAuth, name, parser_config)
+test_parser_config_invalid(HttpApiAuth, name, parser_config, expected_message)
+test_parser_config_empty(HttpApiAuth)
+test_parser_config_unset(HttpApiAuth)
+test_parser_config_none(HttpApiAuth)
+test_unsupported_field(HttpApiAuth, payload)
}
class TestParserConfigBugFix {
+test_parser_config_missing_raptor_and_graphrag(HttpApiAuth)
+test_parser_config_with_only_raptor(HttpApiAuth)
+test_parser_config_with_only_graphrag(HttpApiAuth)
+test_parser_config_with_both_fields(HttpApiAuth)
+test_parser_config_different_chunk_methods(HttpApiAuth, chunk_method)
}
Summary
test_create_dataset.py is a vital test module that rigorously tests the dataset creation API endpoint of the InfiniFlow system. It emphasizes validation, error handling, concurrent operations, and configuration correctness to maintain high API quality and robustness. The file integrates tightly with authentication helpers, configuration constants, and utility functions to produce meaningful, repeatable tests.