test_update_dataset.py


Overview

test_update_dataset.py is a comprehensive test suite designed to validate the functionality, robustness, and correctness of the dataset update API in the InfiniFlow system. This file uses the pytest framework along with property-based testing via hypothesis to ensure that the update_dataset API behaves as expected under a wide variety of input scenarios, including edge cases and invalid inputs.

The tests cover:

This file plays a critical role in maintaining the quality and reliability of the dataset update functionality by preventing regressions and ensuring adherence to expected input/output contracts.


Classes and Their Responsibilities

TestAuthorization

Tests related to authentication and authorization of the update dataset API.

TestRquest (likely a typo, should be TestRequest)

Tests related to the request format, content types, and payload validation.

TestCapability

Tests concurrency capability of the update API.

TestDatasetUpdate

Extensive tests covering the correctness and validation of dataset update operations, including all supported fields and their constraints.

Key test categories within this class:


Detailed Explanations of Key Tests and Usage Examples

1. TestAuthorization.test_auth_invalid

@pytest.mark.parametrize(
    "auth, expected_code, expected_message",
    [
        (None, 0, "`Authorization` can't be empty"),
        (RAGFlowHttpApiAuth(INVALID_API_TOKEN), 109, "Authentication error: API key is invalid!"),
    ],
    ids=["empty_auth", "invalid_api_token"],
)
def test_auth_invalid(self, auth, expected_code, expected_message):
    res = update_dataset(auth, "dataset_id")
    assert res["code"] == expected_code, res
    assert res["message"] == expected_message, res

2. TestDatasetUpdate.test_name

@given(name=valid_names())
@example("a" * 128)
@settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_name(self, get_http_api_auth, add_dataset_func, name):
    dataset_id = add_dataset_func
    payload = {"name": name}
    res = update_dataset(get_http_api_auth, dataset_id, payload)
    assert res["code"] == 0, res

    res = list_datasets(get_http_api_auth)
    assert res["code"] == 0, res
    assert res["data"][0]["name"] == name, res

3. TestDatasetUpdate.test_avatar

def test_avatar(self, get_http_api_auth, add_dataset_func, tmp_path):
    dataset_id = add_dataset_func
    fn = create_image_file(tmp_path / "ragflow_test.png")
    payload = {
        "avatar": f"data:image/png;base64,{encode_avatar(fn)}",
    }
    res = update_dataset(get_http_api_auth, dataset_id, payload)
    assert res["code"] == 0, res

    res = list_datasets(get_http_api_auth)
    assert res["code"] == 0, res
    assert res["data"][0]["avatar"] == f"data:image/png;base64,{encode_avatar(fn)}", res

4. TestDatasetUpdate.test_parser_config_invalid


Important Implementation Details


Interaction with Other System Components

The test file ensures that backend dataset update functionality conforms to expected input validation and business rules before deployment, thus preventing faulty data or unauthorized access.


Visual Diagram

classDiagram
    class TestAuthorization {
        +test_auth_invalid(auth, expected_code, expected_message)
    }

    class TestRquest {
        +test_bad_content_type(get_http_api_auth, add_dataset_func)
        +test_payload_bad(get_http_api_auth, add_dataset_func, payload, expected_message)
        +test_payload_empty(get_http_api_auth, add_dataset_func)
        +test_payload_unset(get_http_api_auth, add_dataset_func)
    }

    class TestCapability {
        +test_update_dateset_concurrent(get_http_api_auth, add_dataset_func)
    }

    class TestDatasetUpdate {
        +test_dataset_id_not_uuid(get_http_api_auth)
        +test_dataset_id_not_uuid1(get_http_api_auth)
        +test_dataset_id_wrong_uuid(get_http_api_auth)
        +test_name(get_http_api_auth, add_dataset_func, name)
        +test_name_invalid(get_http_api_auth, add_dataset_func, name, expected_message)
        +test_name_duplicated(get_http_api_auth, add_datasets_func)
        +test_name_case_insensitive(get_http_api_auth, add_datasets_func)
        +test_avatar(get_http_api_auth, add_dataset_func, tmp_path)
        +test_avatar_exceeds_limit_length(get_http_api_auth, add_dataset_func)
        +test_avatar_invalid_prefix(get_http_api_auth, add_dataset_func, tmp_path, avatar_prefix, expected_message)
        +test_avatar_none(get_http_api_auth, add_dataset_func)
        +test_description(get_http_api_auth, add_dataset_func)
        +test_description_exceeds_limit_length(get_http_api_auth, add_dataset_func)
        +test_description_none(get_http_api_auth, add_dataset_func)
        +test_embedding_model(get_http_api_auth, add_dataset_func, embedding_model)
        +test_embedding_model_invalid(get_http_api_auth, add_dataset_func, name, embedding_model)
        +test_embedding_model_format(get_http_api_auth, add_dataset_func, name, embedding_model)
        +test_embedding_model_none(get_http_api_auth, add_dataset_func)
        +test_permission(get_http_api_auth, add_dataset_func, permission)
        +test_permission_invalid(get_http_api_auth, add_dataset_func, permission)
        +test_permission_none(get_http_api_auth, add_dataset_func)
        +test_chunk_method(get_http_api_auth, add_dataset_func, chunk_method)
        +test_chunk_method_invalid(get_http_api_auth, add_dataset_func, chunk_method)
        +test_chunk_method_none(get_http_api_auth, add_dataset_func)
        +test_pagerank(get_http_api_auth, add_dataset_func, pagerank)
        +test_pagerank_invalid(get_http_api_auth, add_dataset_func, pagerank, expected_message)
        +test_pagerank_none(get_http_api_auth, add_dataset_func)
        +test_parser_config(get_http_api_auth, add_dataset_func, parser_config)
        +test_parser_config_invalid(get_http_api_auth, add_dataset_func, parser_config, expected_message)
        +test_parser_config_empty(get_http_api_auth, add_dataset_func)
        +test_parser_config_none(get_http_api_auth, add_dataset_func)
        +test_parser_config_empty_with_chunk_method_change(get_http_api_auth, add_dataset_func)
        +test_parser_config_unset_with_chunk_method_change(get_http_api_auth, add_dataset_func)
        +test_parser_config_none_with_chunk_method_change(get_http_api_auth, add_dataset_func)
        +test_field_unsupported(get_http_api_auth, add_dataset_func, payload)
        +test_field_unset(get_http_api_auth, add_dataset_func)
    }

    TestAuthorization <|-- TestRquest : "shares auth validation"
    TestDatasetUpdate <-- TestCapability : "shares dataset update tests"

Summary

test_update_dataset.py is a critical quality assurance component for the InfiniFlow platform's dataset update API. It rigorously verifies input validation, authorization, concurrency handling, and property updates with a rich set of test cases covering nominal, boundary, and error conditions. The file relies on pytest fixtures, parameterization, and property-based testing to achieve thorough coverage. This test suite helps prevent regression and ensures that only valid and authorized updates are applied to datasets, maintaining data integrity and system reliability.