test_parse_documents.py

Overview

test_parse_documents.py is a comprehensive test suite designed to validate the document parsing functionality of the InfiniFlow system. It uses the pytest framework to test various scenarios related to parsing documents within datasets, focusing on authorization, input validation, concurrency, and bulk operations. The file ensures that the document parsing API behaves correctly under normal conditions, edge cases, and error states.

The tests cover:

Utility functions are used to wait for asynchronous parsing completion, verify document status, and validate parsing result details.


Detailed Explanations

Imported Modules and Utilities


Functions

condition(_auth, _dataset_id, _document_ids=None)

Wait condition function decorated with @wait_for(30, 1, "Document parsing timeout"). It polls the document list in a dataset and checks if all targeted documents have finished parsing ("run" == "DONE").


validate_document_details(auth, dataset_id, document_ids)

Verifies detailed parsing results for each document by asserting:


Test Classes and Methods

class TestAuthorization

Tests related to API authorization.


class TestDocumentsParse

Tests core document parsing functionality with various payloads and dataset conditions.


Standalone Test Functions


Important Implementation Details


Interactions with Other System Components


Usage Examples

Example: Wait for parsing to complete and validate

# Wait until documents finish parsing
condition(auth, dataset_id, document_ids)

# Validate detailed parsing results
validate_document_details(auth, dataset_id, document_ids)

Example: Parsing documents with error handling

res = parse_documents(auth, dataset_id, {"document_ids": ["invalid_id"]})
if res["code"] != 0:
    print(f"Error: {res['message']}")

Mermaid Class Diagram

classDiagram
    class TestAuthorization {
        +test_invalid_auth(invalid_auth, expected_code, expected_message)
    }
    class TestDocumentsParse {
        +test_basic_scenarios(HttpApiAuth, add_documents_func, payload, expected_code, expected_message)
        +test_invalid_dataset_id(HttpApiAuth, add_documents_func, dataset_id, expected_code, expected_message)
        +test_parse_partial_invalid_document_id(HttpApiAuth, add_documents_func, payload)
        +test_repeated_parse(HttpApiAuth, add_documents_func)
        +test_duplicate_parse(HttpApiAuth, add_documents_func)
    }
    class Functions {
        +condition(_auth, _dataset_id, _document_ids=None)
        +validate_document_details(auth, dataset_id, document_ids)
        +test_parse_100_files(HttpApiAuth, add_dataset_func, tmp_path)
        +test_concurrent_parse(HttpApiAuth, add_dataset_func, tmp_path)
    }
    TestAuthorization --> Functions : uses
    TestDocumentsParse --> Functions : uses

Summary

test_parse_documents.py is a robust, well-structured test module aimed at ensuring the integrity and reliability of the document parsing feature in InfiniFlow. It validates authorization, input correctness, concurrency, and bulk processing, using a mix of synchronous and asynchronous testing techniques. The tests provide confidence that the parsing API behaves as expected across a wide range of scenarios and edge cases.