test_stop_parse_documents.py
Overview
This file contains automated tests for the stop_parse_documents API endpoint within the InfiniFlow system. The primary focus is to verify the behavior and robustness of stopping document parsing operations in various scenarios, including authorization validation, input validation, concurrency, and edge cases. It also ensures that document parsing states are correctly updated when parsing is stopped.
The tests are written using the pytest framework and leverage utility functions from shared modules for document operations (uploading, listing, parsing, and stopping parsing). Some tests are marked as skipped due to instability or incomplete implementation.
Detailed Explanation
Imported Modules and Utilities
concurrent.futures.ThreadPoolExecutor: Used for running concurrent stop parse requests in a thread pool.time.sleep: To add delays for asynchronous operations.pytest: Testing framework for parameterized and marked test cases.common: Custom module containing utility functions:bulk_upload_documentslist_documentsparse_documentsstop_parse_documents
configs: Contains constants such asINVALID_API_TOKEN.libs.auth: ContainsRAGFlowHttpApiAuthclass for API authentication.utils: Contains helper functionwait_forused to implement polling logic.
Functions
validate_document_parse_done(auth, dataset_id, document_ids)
Purpose:
Verify that each document in the given list has completed parsing successfully.
Parameters:
auth: Authorization object for API calls.dataset_id(str): Identifier for the dataset containing the documents.document_ids(list[str]): List of document IDs to validate.
Behavior:
Fetches each document using
list_documents.Asserts that the document's
runstatus is"DONE".Checks that
process_begin_atis populated, process_duration is positive,progressis greater than zero, and that progress_msg contains"Task done".
Usage Example:
validate_document_parse_done(auth, "dataset123", ["doc1", "doc2"])
validate_document_parse_cancel(auth, dataset_id, document_ids)
Purpose:
Confirm that each document's parsing was canceled correctly.
Parameters:
auth: Authorization object for API calls.dataset_id(str): Dataset identifier.document_ids(list[str]): List of document IDs to check.
Behavior:
Fetches each document.
Asserts that
runstatus is"CANCEL".Confirms that
process_begin_atis populated.Validates that
progressis exactly 0.0, indicating no further processing.
Usage Example:
validate_document_parse_cancel(auth, "dataset123", ["doc3", "doc4"])
Classes
TestAuthorization
Tests related to authorization validation for the stop_parse_documents API.
test_invalid_auth(self, invalid_auth, expected_code, expected_message)
Parameters:
invalid_auth: An invalid or missing authorization token/object.expected_code(int): Expected error code from the API response.expected_message(str): Expected error message.
Behavior:
Calls
stop_parse_documentswith invalid authorization.Asserts that the returned error code and message match expectations.
Test Cases:
No authorization provided (
None).Invalid API token (
INVALID_API_TOKEN).
TestDocumentsParseStop
Contains parameterized tests for stopping document parsing with various payloads and conditions.
Note: This class is currently marked with
@pytest.mark.skipindicating tests are skipped during runs.
test_basic_scenarios(self, HttpApiAuth, add_documents_func, payload, expected_code, expected_message)
Parameters:
HttpApiAuth: Valid authorization fixture.add_documents_func: Fixture returning(dataset_id, document_ids).payload: Payload to send to stop parse API; can beNone, dict, string, or callable returning dict.expected_code(int): Expected API response code.expected_message(str): Expected message for error cases.
Behavior:
Adds documents to a dataset.
Starts parsing documents.
Calls
stop_parse_documentswith the given payload.Validates API response code and message.
If stopping succeeded, checks that documents requested to stop are canceled, and others are done.
Important Implementation Detail:
Uses a nested
conditionfunction decorated withwait_forto poll document parsing status with a timeout.
test_invalid_dataset_id(self, HttpApiAuth, add_documents_func, invalid_dataset_id, expected_code, expected_message)
Parameters:
HttpApiAuth: Authorization.add_documents_func: Fixture for dataset and documents.invalid_dataset_id(str): Dataset ID expected to be invalid.expected_code: Expected error code.expected_message: Expected error message.
Behavior:
Attempts to stop parsing with an invalid dataset ID.
Asserts proper error handling.
test_stop_parse_partial_invalid_document_id(self, HttpApiAuth, add_documents_func, payload)
Behavior:
Tests stopping parsing when some document IDs are invalid.
Confirms that the API returns an error and no documents are partially stopped.
test_repeated_stop_parse(self, HttpApiAuth, add_documents_func)
Behavior:
Stops parsing documents successfully once.
Attempts to stop parsing those same documents again.
Expects an error indicating no documents can be stopped because they are either done or at zero progress.
test_duplicate_stop_parse(self, HttpApiAuth, add_documents_func)
Behavior:
Calls stop parsing with duplicate document IDs.
Ensures duplicates are handled gracefully, counting success only once and listing duplicates in errors.
Standalone Tests (Marked as skipped)
test_stop_parse_100_files(HttpApiAuth, add_dataset_func, tmp_path)
Uploads 100 documents.
Starts parsing all.
Calls stop parsing on all documents.
Validates cancellation.
test_concurrent_parse(HttpApiAuth, add_dataset_func, tmp_path)
Uploads 50 documents.
Starts parsing.
Stops parsing documents concurrently using 5 threads.
Verifies all stop requests succeeded and canceled states are valid.
Important Implementation Details and Algorithms
Polling for Completion:
The use ofwait_fordecorator implements polling to check the completion status of documents with a timeout and interval. This ensures tests wait for asynchronous parsing operations to finish before asserting.Parameterized Testing:
Extensive use ofpytest.mark.parametrizeenables testing multiple input scenarios, including invalid inputs, partial successes, and edge cases without duplicating code.Concurrent Execution:
The concurrent stop parse test usesThreadPoolExecutorto simulate multiple parallel client requests for stopping document parsing, validating thread safety and correctness under concurrency.Error Handling Verification:
Tests verify that the API properly rejects invalid tokens, unauthorized dataset access, invalid document IDs, and repeated or duplicate stop requests.
Interaction with Other System Components
API Utilities (common module):
This test file depends on utility functions for interacting with the API endpoints:bulk_upload_documentslist_documentsparse_documentsstop_parse_documents
These are likely wrappers around HTTP API calls for dataset/document management.
Authentication (libs.auth):
UsesRAGFlowHttpApiAuthfor generating authorization headers or tokens.Configuration (configs):
Uses constants likeINVALID_API_TOKENto simulate error conditions.Test Fixtures:
The tests depend on fixtures likeHttpApiAuth,add_documents_func, andadd_dataset_functo provide preconfigured authenticated sessions and datasets with uploaded documents.
Visual Diagram
The following Mermaid class diagram summarizes the structure of the test classes and utility functions used in this file:
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestDocumentsParseStop {
+test_basic_scenarios(HttpApiAuth, add_documents_func, payload, expected_code, expected_message)
+test_invalid_dataset_id(HttpApiAuth, add_documents_func, invalid_dataset_id, expected_code, expected_message)
+test_stop_parse_partial_invalid_document_id(HttpApiAuth, add_documents_func, payload)
+test_repeated_stop_parse(HttpApiAuth, add_documents_func)
+test_duplicate_stop_parse(HttpApiAuth, add_documents_func)
}
class UtilityFunctions {
+validate_document_parse_done(auth, dataset_id, document_ids)
+validate_document_parse_cancel(auth, dataset_id, document_ids)
}
class ConcurrentTests {
+test_stop_parse_100_files(HttpApiAuth, add_dataset_func, tmp_path)
+test_concurrent_parse(HttpApiAuth, add_dataset_func, tmp_path)
}
TestAuthorization --> UtilityFunctions
TestDocumentsParseStop --> UtilityFunctions
ConcurrentTests --> UtilityFunctions
Summary
test_stop_parse_documents.py is a comprehensive test suite focused on validating the "stop parsing document" functionality in the InfiniFlow system.
It covers authorization, input validation, normal and edge cases, concurrency, and error scenarios.
The file uses pytest with parameterization and fixtures to maintain modular, reusable, and scalable tests.
Validation helper functions ensure document parse states conform to expectations after stop operations.
Concurrency and large-scale tests, although skipped, indicate a focus on system robustness.
This file integrates closely with core document management APIs and authentication mechanisms.
This documentation should enable developers and testers to understand the purpose, design, and usage of the tests in this file, and how to extend or maintain them effectively.