test_stop_parse_documents.py
Overview
This file contains automated test cases for the "stop parse documents" functionality of the InfiniFlow system's document processing API. Its main purpose is to validate that the API endpoint responsible for stopping document parsing behaves correctly under various scenarios — including authorization failures, invalid inputs, partial successes, concurrency, and large-scale document operations.
The tests ensure the system maintains data integrity, enforces access controls, and properly updates document states when parse operations are interrupted. The file utilizes the pytest framework for test management and assertions, and interacts heavily with helper functions from the common test utilities and the system's HTTP API authentication.
Detailed Explanation of Components
Helper Functions
validate_document_parse_done(auth, dataset_id, document_ids)
Purpose:
Validates that the specified documents have completed parsing successfully.Parameters:
auth: Authentication object used to make API calls.dataset_id(str): Identifier of the dataset containing the documents.document_ids(list[str]): List of document IDs to validate.
Functionality:
For each document ID, it fetches the document metadata and asserts that:The document's
runstatus is"DONE".The
process_begin_attimestamp is populated.The process_duration is greater than zero.
The
progressis greater than zero.The progress_msg contains the phrase
"Task done".
Usage Example:
validate_document_parse_done(auth, "dataset123", ["doc1", "doc2"])
validate_document_parse_cancel(auth, dataset_id, document_ids)
Purpose:
Validates that the specified documents have had their parsing operation canceled.Parameters:
auth: Authentication object used to make API calls.dataset_id(str): Identifier of the dataset containing the documents.document_ids(list[str]): List of document IDs to validate.
Functionality:
For each document ID, it fetches the document metadata and asserts that:The document's
runstatus is"CANCEL".The
process_begin_attimestamp is populated.The
progressis exactly0.0, indicating no progress.
Usage Example:
validate_document_parse_cancel(auth, "dataset123", ["doc3", "doc4"])
Test Classes
class TestAuthorization
Purpose:
Tests that thestop_parse_documentsAPI endpoint enforces proper authorization and returns appropriate error codes/messages for invalid or missing tokens.Test Methods:
test_invalid_auth(auth, expected_code, expected_message)Parameters:
auth: EitherNoneor an invalid token wrapped inRAGFlowHttpApiAuth.expected_code: Expected error code integer returned by the API.expected_message: Expected error message string returned by the API.
Description:
Callsstop_parse_documnetswith invalid or missing authorization and verifies the error code and message.
Usage Example:
Usespytest.mark.parametrizefor testing multiple invalid authorization cases.
class TestDocumentsParseStop
Purpose:
Contains comprehensive tests for the behavior of stopping document parsing, including validation of payloads, dataset IDs, repeated requests, partial invalid document lists, and duplicate document IDs.Note:
The entire class is marked with@pytest.mark.skipindicating these tests are currently skipped, possibly due to instability or development status.Test Methods:
test_basic_scenarios(self, get_http_api_auth, add_documents_func, payload, expected_code, expected_message)
Tests various payloads including empty lists, invalid document IDs, malformed JSON, and valid document IDs. It uses a wait condition to ensure documents finish parsing and verifies that stopping parse updates document statuses correctly.test_invalid_dataset_id(self, get_http_api_auth, add_documents_func, invalid_dataset_id, expected_code, expected_message)
Tests that stopping parse with invalid or empty dataset IDs produces appropriate errors.test_stop_parse_partial_invalid_document_id(self, get_http_api_auth, add_documents_func, payload)
Tests stopping parse with a mixture of valid and invalid document IDs, ensuring the API responds with the correct failure and does not cancel valid documents.test_repeated_stop_parse(self, get_http_api_auth, add_documents_func)
Tests behavior when attempting to stop parsing multiple times on the same documents, verifying correct error handling.test_duplicate_stop_parse(self, get_http_api_auth, add_documents_func)
Tests stopping parse with duplicate document IDs included in the request, verifying success counts and error reporting for duplicates.
Individual Test Functions Outside Classes
test_stop_parse_100_files(get_http_api_auth, add_dataset_func, tmp_path)
Purpose:
Tests the stop parse functionality with a large batch of 100 documents, verifying scalability and correctness.Details:
Uploads 100 documents, starts parsing, then stops parsing and validates cancellation state.Note:
Marked as skipped due to instability.
test_concurrent_parse(get_http_api_auth, add_dataset_func, tmp_path)
Purpose:
Tests stopping parsing concurrently on multiple documents to validate thread safety and correctness under parallel operations.Details:
Uses aThreadPoolExecutorto send multiple stop parse requests in parallel, then validates that all stops succeeded and documents are canceled.Note:
Marked as skipped due to instability.
Important Implementation Details and Algorithms
Waiting for Document Parsing Completion:
Thetest_basic_scenariostest uses a decorator functionwait_forto repeatedly poll document statuses until all documents finish parsing or a timeout occurs. This ensures tests only proceed once the system reaches a stable state.Parameterized Testing:
Leveragingpytest.mark.parametrizeextensively allows testing multiple input/output combinations, improving coverage and reducing boilerplate.Concurrent Execution:
The concurrency test utilizes Python'sThreadPoolExecutorto simulate multiple simultaneous API calls, revealing potential race conditions or concurrency issues.Validation Assertions:
The helper validation functions assert multiple document metadata fields, ensuring the document lifecycle states and progress indicators are consistent with expected behavior.
Interaction With Other System Parts
API Utility Functions (
common):
The file imports several helper API functions such asbulk_upload_documents,list_documnets(note the typo in 'documents'),parse_documnets, andstop_parse_documnets. These functions act as clients to the backend HTTP API, enabling tests to simulate user actions like uploading, parsing, and stopping parse operations.Authentication (
libs.auth):RAGFlowHttpApiAuthis used to represent API tokens for authorization headers in requests.Utilities (
libs.utils):
Thewait_forutility is used to implement retry logic with timeout while waiting for asynchronous operations to complete.Pytest Framework:
The test cases are structured usingpytestdecorators and assertions, integrating with the broader test suite for automation.
Visual Diagram
The following Mermaid class diagram summarizes the structure of the test classes and helper functions in this file.
classDiagram
class validate_document_parse_done {
+auth
+dataset_id: str
+document_ids: list[str]
}
class validate_document_parse_cancel {
+auth
+dataset_id: str
+document_ids: list[str]
}
class TestAuthorization {
+test_invalid_auth(auth, expected_code, expected_message)
}
class TestDocumentsParseStop {
+test_basic_scenarios(get_http_api_auth, add_documents_func, payload, expected_code, expected_message)
+test_invalid_dataset_id(get_http_api_auth, add_documents_func, invalid_dataset_id, expected_code, expected_message)
+test_stop_parse_partial_invalid_document_id(get_http_api_auth, add_documents_func, payload)
+test_repeated_stop_parse(get_http_api_auth, add_documents_func)
+test_duplicate_stop_parse(get_http_api_auth, add_documents_func)
}
class test_stop_parse_100_files {
+get_http_api_auth
+add_dataset_func
+tmp_path
}
class test_concurrent_parse {
+get_http_api_auth
+add_dataset_func
+tmp_path
}
TestAuthorization ..> stop_parse_documnets : calls
TestDocumentsParseStop ..> stop_parse_documnets : calls
test_stop_parse_100_files ..> stop_parse_documnets : calls
test_concurrent_parse ..> stop_parse_documnets : calls
validate_document_parse_done ..> list_documnets : calls
validate_document_parse_cancel ..> list_documnets : calls
Summary
The test_stop_parse_documents.py file is a comprehensive test suite focused on verifying the robustness, correctness, and security of the "stop parse documents" API feature in InfiniFlow. It covers edge cases, authorization, concurrency, and large-scale operations using well-structured pytest tests supported by utility functions for validation and synchronization. This testing ensures that document parsing stoppage behaves predictably and securely within the system.