test_update_document.py
Overview
test_update_document.py is a test suite module designed to validate the behavior and robustness of document update functionalities within the InfiniFlow system, specifically focusing on the ragflow_sdk's Document update operations. The tests ensure that document metadata, naming conventions, chunking methods, and parser configurations are correctly handled, enforcing validation rules and constraints.
The file leverages the pytest framework for parameterized testing, covering a wide range of input scenarios including normal cases, boundary values, invalid inputs, and skipped cases due to known issues. This suite helps maintain the integrity of document update operations, preventing invalid states and ensuring consistency across dataset documents.
Detailed Explanations
Imports
pytest: The testing framework used for organizing and running the test cases.configs.DOCUMENT_NAME_LIMIT: A constant defining the maximum allowed length for a document name.ragflow_sdk.DataSet: The SDK interface representing datasets and their contained documents.
Class: TestDocumentsUpdated
This class includes tests for validating updates to documents, focusing on the fields: name, meta_fields, chunk_method, and invalid fields.
Methods:
test_name(self, add_documents, name, expected_message)
Purpose:
Tests updating thenameattribute of a document, verifying validation on length, type, file extension, and duplication.Parameters:
add_documents: pytest fixture providing a tuple(dataset, documents)wheredocumentsis a list of document objects in the dataset.name: The new name to assign to the document (varied by parameterize).expected_message: The expected error message substring if the update should fail; empty string means success expected.
Behavior:
If an error is expected, asserts that the exception contains the expected message.
If successful, verifies that the document's name is updated in the dataset.
Examples:
# Valid update test_name(add_documents, "new_name.txt", "") # Should succeed # Invalid update: changing extension test_name(add_documents, "", "The extension of file can't be changed") # Should raise error
test_meta_fields(self, add_documents, meta_fields, expected_message)
Purpose:
Tests updating themeta_fieldsattribute, which must be a dictionary.Parameters:
add_documents: pytest fixture (dataset, documents).meta_fields: The metadata dictionary or invalid type to test.expected_message: Expected error message substring.
Behavior:
Raises an exception if
meta_fieldsis not a dictionary.Updates meta_fields successfully if valid.
test_chunk_method(self, add_documents, chunk_method, expected_message)
Purpose:
Tests updating thechunk_methodattribute with allowed values and invalid inputs.Parameters:
add_documents: pytest fixture (dataset, documents).chunk_method: The chunking strategy string to set.expected_message: Expected error message if invalid.
Valid chunk methods include:
"naive","manual","qa","table","paper","book","laws","presentation","picture","one","knowledge_graph","email","tag".Behavior:
Raises error if chunk_method is invalid or unknown.
Otherwise, confirms update is reflected in dataset.
test_invalid_field(self, add_documents, payload, expected_message)
Purpose:
Tests that certain document fields are immutable or invalid for update and raises appropriate errors.Parameters:
add_documents: pytest fixture.payload: A dictionary with fields to update (usually invalid/immutable fields).expected_message: Expected error substring on failure.
Fields tested include:
chunk_count,create_date,create_time,created_by,dataset_id,id,location,process_begin_at,process_duration,progress,progress_msg,run,size,source_type,thumbnail,token_count,type,update_date,update_time.Behavior:
The test expects an exception containing the expected message when trying to update these fields.
Note:
Many test cases are skipped due to known issues (issues/6104).
Class: TestUpdateDocumentParserConfig
This class tests updating the parser_config associated with a document, which configures how document content is parsed and chunked.
Methods:
test_parser_config(self, client, add_documents, chunk_method, parser_config, expected_message)
Purpose:
Validates the update ofparser_configfor a document along with thechunk_method, including boundary checks on numeric parameters and type checks.Parameters:
client: The client instance used to create expectedParserConfigobjects.add_documents: pytest fixture providing(dataset, documents).chunk_method: The chunking method string to assign.parser_config: Dictionary with parser configuration parameters to apply.expected_message: Expected error message substring if update should fail.
Behavior:
If invalid configuration is given, asserts that an exception with the expected message is raised.
If valid, asserts that all provided config parameters are correctly reflected in the document's parser config.
If
parser_configis empty, checks that the default config is applied.
Parameters validated include:
chunk_token_num(range 1 to 100,000,000)layout_recognize(e.g. "DeepDOC", "Naive")html4excel(Boolean)delimiter(string)task_page_size(range 1 to 100,000,000)raptor(dict withuse_raptorBoolean)auto_keywords(range 0 to 32)auto_questions(range 0 to 10)topn_tags(range 0 to 10)
Note:
Many parameter edge cases are skipped due to issueissues/6098.
Important Implementation Details and Algorithms
Parameterized Testing:
Each test method extensively usespytest.mark.parametrizeto run multiple data-driven test cases, ensuring broad coverage of input scenarios.Exception Handling:
Tests that expect failure wrap the update call inpytest.raises(Exception)context managers and verify that the exception message contains the expected substring.Case Sensitivity and Duplication Checks:
Document names are tested for duplication within the same dataset, including case-insensitive matches (e.g.,"ragflow_test_upload_1.txt"vs"RAGFLOW_TEST_UPLOAD_1.TXT").Immutable Fields Enforcement:
Certain fields critical to document identity or system state (likeid,progress,chunk_count) are explicitly prevented from being updated, enforcing data integrity.Parser Configuration Validation:
The parser config update tests ensure that only valid keys and values (including nested dictionaries) are accepted and that default configurations apply correctly when none are provided.
Interaction with Other System Components
ragflow_sdk.DataSetand Documents:
The tests manipulate and verify documents within a dataset, using methods such asdocument.update()anddataset.list_documents(). This implies that the file interacts directly with the SDK's document and dataset management API.Configuration Constants:
The file importsDOCUMENT_NAME_LIMITfrom aconfigsmodule, indicating dependency on system-wide configuration for validation.Client Instance:
TheTestUpdateDocumentParserConfigtests receive aclientparameter, suggesting that the tests require an authenticated or configured client context to instantiate or validate parser configuration objects.Issue Tracking:
Some test cases are skipped due to open issues (e.g.,issues/6098,issues/6104), showing integration with the team's issue tracking and test management process.
Usage Examples
# Example: Updating a document's name successfully
dataset, documents = add_documents_fixture()
doc = documents[0]
doc.update({"name": "updated_name.txt"})
updated_doc = dataset.list_documents(id=doc.id)[0]
assert updated_doc.name == "updated_name.txt"
# Example: Attempting to update with invalid chunk_method raises error
with pytest.raises(Exception) as excinfo:
doc.update({"chunk_method": "invalid_method"})
assert "doesn't exist" in str(excinfo.value)
Visual Diagram
classDiagram
class TestDocumentsUpdated {
+test_name(name, expected_message)
+test_meta_fields(meta_fields, expected_message)
+test_chunk_method(chunk_method, expected_message)
+test_invalid_field(payload, expected_message)
}
class TestUpdateDocumentParserConfig {
+test_parser_config(chunk_method, parser_config, expected_message)
}
TestDocumentsUpdated <-- pytest
TestUpdateDocumentParserConfig <-- pytest
Summary
This test module rigorously validates the update operations on document objects within datasets, focusing on:
Name validation (length, extension, uniqueness)
Meta fields type enforcement
Allowed chunking methods
Immutable fields protection
Complex parser configuration validation with nested keys and value ranges
It employs parameterized tests with both positive and negative cases and integrates with the broader InfiniFlow SDK ecosystem. The tests help ensure that document updates conform to expected business rules, maintaining dataset consistency and preventing invalid states.