test_create_dataset.py


Overview

test_create_dataset.py is a comprehensive automated test suite designed to verify the functionality, robustness, and correctness of dataset creation features in the InfiniFlow platform's SDK (ragflow_sdk). It primarily tests the create_dataset method exposed by the RAGFlow client, which is responsible for creating datasets with various configurations and validating input constraints.

The tests cover a broad range of scenarios, including:

The use of pytest and hypothesis frameworks enables parameterized, randomized, and property-based testing to ensure dataset creation behaves as expected under diverse inputs.


Key Components

1. Test Classes

All test classes use the pytest.mark.usefixtures("clear_datasets") decorator to ensure a clean state (datasets cleared) before each test execution.

1.1. TestAuthorization

1.2. TestCapability

1.3. TestDatasetCreate

1.4. TestParserConfigBugFix


Important Implementation Details & Algorithms


Interaction with Other System Components


Usage Examples

Example: Creating a dataset with a valid name

def test_create_dataset_with_valid_name(client):
    dataset = client.create_dataset(name="valid_dataset_name")
    assert dataset.name == "valid_dataset_name"

Example: Expecting failure due to invalid avatar prefix

def test_create_dataset_with_invalid_avatar_prefix(client, tmp_path):
    fn = create_image_file(tmp_path / "test.png")
    invalid_avatar = "invalid_prefix" + encode_avatar(fn)
    with pytest.raises(Exception) as excinfo:
        client.create_dataset(name="test", avatar=invalid_avatar)
    assert "Missing MIME prefix" in str(excinfo.value)

File Structure Diagram

The file contains no classes with properties, only test classes with multiple test methods. The following Mermaid class diagram summarizes the test classes and their key test methods:

classDiagram
    class TestAuthorization {
        +test_auth_invalid(invalid_auth, expected_message)
    }
    class TestCapability {
        +test_create_dataset_1k(client)
        +test_create_dataset_concurrent(client)
    }
    class TestDatasetCreate {
        +test_name(client, name)
        +test_name_invalid(client, name, expected_message)
        +test_name_duplicated(client)
        +test_name_case_insensitive(client)
        +test_avatar(client, tmp_path)
        +test_avatar_exceeds_limit_length(client)
        +test_avatar_invalid_prefix(client, tmp_path, name, prefix, expected_message)
        +test_avatar_unset(client)
        +test_description(client)
        +test_description_exceeds_limit_length(client)
        +test_description_unset(client)
        +test_description_none(client)
        +test_embedding_model(client, name, embedding_model)
        +test_embedding_model_invalid(client, name, embedding_model)
        +test_embedding_model_format(client, name, embedding_model)
        +test_embedding_model_unset(client)
        +test_embedding_model_none(client)
        +test_permission(client, name, permission)
        +test_permission_invalid(client, name, permission)
        +test_permission_unset(client)
        +test_permission_none(client)
        +test_chunk_method(client, name, chunk_method)
        +test_chunk_method_invalid(client, name, chunk_method)
        +test_chunk_method_unset(client)
        +test_chunk_method_none(client)
        +test_parser_config(client, name, parser_config)
        +test_parser_config_invalid(client, name, parser_config, expected_message)
        +test_parser_config_empty(client)
        +test_parser_config_unset(client)
        +test_parser_config_none(client)
        +test_unsupported_field(client, payload)
    }
    class TestParserConfigBugFix {
        +test_parser_config_missing_raptor_and_graphrag(client)
        +test_parser_config_with_only_raptor(client)
        +test_parser_config_with_only_graphrag(client)
        +test_parser_config_with_both_fields(client)
        +test_parser_config_different_chunk_methods(client, chunk_method)
    }

Summary


This documentation should assist developers and QA engineers in understanding the coverage, intent, and extensibility of the dataset creation tests in the InfiniFlow system.