test_add_chunk.py
Overview
test_add_chunk.py is a test suite designed to rigorously validate the functionality, correctness, and robustness of the chunk addition feature in the InfiniFlow platform's document management subsystem. This file primarily contains unit and integration tests that verify adding chunks to documents under various scenarios, including authentication, input validation, concurrency, and edge cases like deleted documents.
The tests use the pytest framework and interact with the system's API through helper functions such as add_chunk, list_chunks, and delete_documnets (sic). It also uses an HTTP API authentication class RAGFlowHttpApiAuth. The tests cover both expected successful operations and various failure modes, ensuring the API behaves correctly and returns appropriate error codes and messages.
Detailed Explanation
Imported Modules and Dependencies
concurrent.futures.ThreadPoolExecutor: Used to run concurrent chunk additions in one test.pytest: Testing framework for parameterized tests and test marking.common: Provides constants (INVALID_API_TOKEN) and helper API functions (add_chunk,list_chunks,delete_documnets).libs.auth.RAGFlowHttpApiAuth: Authentication handler for API requests.
Functions
validate_chunk_details(dataset_id, document_id, payload, res)
Purpose:
Helper function to assert that the chunk details returned by the API match the expected values provided in the input payload.
Parameters:
dataset_id(str): Identifier for the dataset to which the chunk belongs.document_id(str): Identifier for the document to which the chunk belongs.payload(dict): The input data used when adding the chunk. Expected keys include"content", optional"important_keywords", and optional"questions".res(dict): The API response from adding the chunk, expected to contain chunk details underres["data"]["chunk"].
Behavior:
Asserts
dataset_idanddocument_idare correctly echoed in the chunk response.Validates the chunk's
contentmatches payload content.If present, validates
important_keywordsandquestionsfields, trimming strings and filtering empty entries forquestions.
Usage Example:
validate_chunk_details("dataset123", "doc456", {"content": "text", "important_keywords": ["key1"]}, add_chunk_response)
Classes and Tests
class TestAuthorization
Tests around authorization failures when adding chunks.
Uses
pytest.mark.parametrizeto test multiple scenarios:No authorization header (expected error code 0 and message about missing Authorization).
Invalid API token (expected error code 109 and authentication error message).
Highlighted Test Method:
test_invalid_auth(auth, expected_code, expected_message)
Parameters:
auth: The authentication object to use for the API call; can beNoneor an invalid token.expected_code(int): Expected error code from the API.expected_message(str): Expected error message string.
Purpose:
Ensure that the system correctly rejects unauthorized or invalid authentication attempts.
class TestAddChunk
This class contains comprehensive tests for the add_chunk API with various inputs, validation rules, and scenarios.
Test Methods and Their Purposes:
test_content(get_http_api_auth, add_document, payload, expected_code, expected_message)
Validates the behavior when adding chunks with different types and values ofcontent.Checks for required content, type errors, and empty or whitespace-only content.
Verifies successful chunk additions increment chunk count.
test_important_keywords(get_http_api_auth, add_document, payload, expected_code, expected_message)
Checks that theimportant_keywordsfield, if provided, must be a list of strings.Handles empty strings, duplicate keywords, and invalid types.
test_questions(get_http_api_auth, add_document, payload, expected_code, expected_message)
Similar totest_important_keywords, but for thequestionsfield. Validates list type and string contents.test_invalid_dataset_id(get_http_api_auth, add_document, dataset_id, expected_code, expected_message)
Tests API response when invalid or empty dataset IDs are used, including ownership checks.test_invalid_document_id(get_http_api_auth, add_document, document_id, expected_code, expected_message)
Similar to dataset ID tests but focuses on document ID validation and ownership.test_repeated_add_chunk(get_http_api_auth, add_document)
Ensures that adding the same chunk content multiple times works correctly and increments chunk count each time.test_add_chunk_to_deleted_document(get_http_api_auth, add_document)
Tests that adding a chunk to a document that has been deleted is rejected with an appropriate error.test_concurrent_add_chunk(get_http_api_auth, add_document)(Skipped)
Tests concurrent chunk additions using a thread pool to ensure thread safety and correctness under concurrent load.
Parameters commonly used in tests
get_http_api_auth: Fixture providing valid authentication credentials.add_document: Fixture that adds a document and returns(dataset_id, document_id).payload: Dict with chunk data, e.g.,{"content": "text", "important_keywords": [...], "questions": [...]}.expected_code: Expected API response code (0 for success, others for errors).expected_message: Expected error or success message from API.
Important Implementation Details and Algorithms
Input Validation:
Tests cover strict validation ofcontentas required and string type,important_keywordsandquestionsas optional lists of strings only.Ownership and Existence Checks:
The API is expected to verify that the dataset and document IDs exist and belong to the authenticated user.Concurrency Handling:
Although skipped in CI, the concurrent addition test suggests the backend supports concurrent chunk additions, and the test verifies atomic increments of chunk counts.Error Handling:
Tests verify that errors are meaningful, with both code and message returned, including HTTP-like error messages for missing or invalid resources.
Interactions with Other System Components
API Functions from
commonModule:add_chunk(auth, dataset_id, document_id, payload): Adds a chunk to a document via the API.list_chunks(auth, dataset_id, document_id): Retrieves the list of chunks for a document, including chunk count.delete_documnets(auth, dataset_id, {"ids": [...]} ): Deletes specified documents.
Authentication Handler:
Uses
RAGFlowHttpApiAuthto manage API key-based authentication.
Document Management Subsystem:
This test file indirectly tests the document and chunk storage and validation logic by invoking API endpoints.
Usage Example of a Typical Test Case
def test_content_example(get_http_api_auth, add_document):
dataset_id, document_id = add_document
payload = {"content": "example chunk"}
res = add_chunk(get_http_api_auth, dataset_id, document_id, payload)
assert res["code"] == 0
validate_chunk_details(dataset_id, document_id, payload, res)
Mermaid Class Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(auth, expected_code, expected_message)
}
class TestAddChunk {
+test_content(get_http_api_auth, add_document, payload, expected_code, expected_message)
+test_important_keywords(get_http_api_auth, add_document, payload, expected_code, expected_message)
+test_questions(get_http_api_auth, add_document, payload, expected_code, expected_message)
+test_invalid_dataset_id(get_http_api_auth, add_document, dataset_id, expected_code, expected_message)
+test_invalid_document_id(get_http_api_auth, add_document, document_id, expected_code, expected_message)
+test_repeated_add_chunk(get_http_api_auth, add_document)
+test_add_chunk_to_deleted_document(get_http_api_auth, add_document)
+test_concurrent_add_chunk(get_http_api_auth, add_document)
}
TestAuthorization --> validate_chunk_details : uses
TestAddChunk --> validate_chunk_details : uses
Summary
test_add_chunk.py is a comprehensive test module focused on validating the chunk addition API of the InfiniFlow platform. It ensures the API handles authentication, input validation, ownership verification, concurrency, and error reporting correctly. The modular test classes and parameterized tests allow extensive coverage of edge cases and expected behaviors, safeguarding the document chunking functionality.
Notes
There is a minor typo in the imported
delete_documnetsfunction name (likely meant to bedelete_documents).Some tests are marked with priority tags (
p1,p2,p3), indicating their importance or test execution order.The concurrent test is currently skipped due to a known issue (
issues/6411).
If you need further integration or system-level documentation, please provide related files or system architecture details.