test_create_kb.py
Overview
This file contains automated test cases for verifying the functionality and robustness of the "create knowledge base (KB)" API endpoint within the InfiniFlow system. The tests focus on validating authorization mechanisms, capability under high load and concurrency, and the correctness of dataset naming conventions. The test suite leverages pytest as the testing framework and uses hypothesis for property-based testing to cover a wide range of input scenarios.
The primary function under test is create_kb, imported from the common module, which presumably sends a request to the backend to create a new knowledge base or dataset. Authentication is handled via RAGFlowWebApiAuth, and the tests use predefined constants like DATASET_NAME_LIMIT and INVALID_API_TOKEN from configuration files.
Detailed Explanation
Imports and Fixtures
ThreadPoolExecutor,
as_completed: Used for testing concurrent requests.pytest: Testing framework used for structuring and running test cases.create_kb: The API call function to create a knowledge base.DATASET_NAME_LIMIT, INVALID_API_TOKEN: Constants for validation and negative testing.
hypothesis (with
given,example,settings): For parameterized and property-based testing.RAGFlowWebApiAuth: Represents authentication tokens for API requests.valid_names: Generator that produces valid dataset names for testing.
The tests use the clear_datasets fixture to ensure dataset state isolation by cleaning the datasets before each test class runs.
Classes and Their Tests
1. TestAuthorization
Tests the authorization behavior of the create_kb API.
Method:
test_auth_invalidParameters:
invalid_auth: An authentication object or None representing invalid credentials.expected_code: The expected HTTP status code.expected_message: The expected error message string.
Purpose: Validate that unauthorized or invalid API tokens are rejected by the API.
Usage: Runs with two cases - no auth and invalid token, expecting HTTP 401 Unauthorized responses.
Assertions: Checks that the response code and message match expected unauthorized error values.
2. TestCapability
Tests the system's capacity to handle a large number of dataset creations and concurrent requests.
Method:
test_create_kb_1kParameters:
WebApiAuth(valid authentication)Purpose: Attempts to create 1000 datasets sequentially to test scalability.
Assertions: Confirms all API responses have code == 0 (success).
Method:
test_create_kb_concurrentParameters:
WebApiAuthPurpose: Tests API behavior under concurrent creation requests (100 datasets, 5 threads).
Implementation: Uses ThreadPoolExecutor to submit concurrent
create_kbcalls.Assertions: Confirms all concurrent responses succeed and the count matches the request count.
3. TestDatasetCreate
Tests validation logic related to dataset naming conventions.
Method:
test_nameParameters:
WebApiAuth,name(generated using valid_names hypothesis strategy)Purpose: Checks that valid dataset names (including edge cases like 128-char names) succeed.
Testing approach: Property-based testing with up to 20 examples.
Assertions: Ensures response code signals success.
Method:
test_name_invalidParameters:
WebApiAuth,name(various invalid inputs),expected_message(expected error)Purpose: Validates rejected dataset names such as empty strings, spaces, overly long names, non-string types, and None.
Assertions: All invalid inputs produce error code
102with appropriate error messages.
Method:
test_name_duplicatedParameters:
WebApiAuthPurpose: Tests that creating datasets with duplicated names is allowed (idempotency or case normalization assumed).
Assertions: Both creation calls succeed with code
0.
Method:
test_name_case_insensitiveParameters:
WebApiAuthPurpose: Verifies dataset names are treated case-insensitively.
Assertions: Creating datasets with uppercase and lowercase variants of the same name both succeed.
Important Implementation Details
The tests assume that the
create_kbfunction returns a dictionary with at least the keyscodeandmessage.The success condition is indicated by code == 0.
Unauthorized access returns HTTP 401 errors.
Dataset name validation is strict: names must be non-empty strings, with maximum length specified by DATASET_NAME_LIMIT.
Concurrency tests use a thread pool with a fixed max worker count of 5 to simulate realistic parallel access.
Property-based testing via hypothesis enhances test coverage by generating diverse valid names.
Interaction With Other Parts of the System
create_kb(common module): Core API interaction function tested here.RAGFlowWebApiAuth(libs.auth): Authentication handler providing authorization tokens.Configuration (
configs): Provides constants like dataset name limits and invalid tokens for testing.valid_names (utils.hypothesis_utils): Supplies a generator for valid dataset names.
Test Infrastructure: The clear_datasets fixture ensures that each test starts from a clean state to avoid inter-test pollution.
The file focuses solely on testing the KB creation API; it does not implement any creation logic itself, but ensures the backend conforms to expected behavior under various conditions.
Usage Examples
Example of how one test case might be invoked (implicitly via pytest CLI):
pytest test_create_kb.py -k TestAuthorization -v
This would run all tests in the TestAuthorization class with verbose output.
Mermaid Class Diagram
Below is a class diagram illustrating the test classes and their main methods:
classDiagram
class TestAuthorization {
+test_auth_invalid(invalid_auth, expected_code, expected_message)
}
class TestCapability {
+test_create_kb_1k(WebApiAuth)
+test_create_kb_concurrent(WebApiAuth)
}
class TestDatasetCreate {
+test_name(WebApiAuth, name)
+test_name_invalid(WebApiAuth, name, expected_message)
+test_name_duplicated(WebApiAuth)
+test_name_case_insensitive(WebApiAuth)
}
Summary
test_create_kb.pyis a pytest suite focused on validating the creation of knowledge bases/datasets through an API.Tests cover authorization, concurrency, capacity, and input validation.
Uses hypothesis for property-based testing of input names.
Ensures the API behaves correctly under normal, boundary, and erroneous conditions.
Interacts primarily with
create_kbfrom common and authentication fromlibs.auth.The file enhances system reliability by early detection of regressions or contract violations in the KB creation API.