test_paser_documents.py

Overview

This file contains automated tests for the document parsing functionality within the InfiniFlow system. It primarily verifies the behavior of the document parsing API, including authorization checks, parsing execution, stopping parsing operations, and concurrency handling.

The tests ensure that documents can be parsed correctly, that invalid or unauthorized requests are handled properly, and that the system behaves as expected under concurrent parsing scenarios. The file uses the pytest framework for structuring tests and assertions.


Detailed Explanations

Imported Modules and Utilities


Functions

condition(_auth, _kb_id, _document_ids=None)


validate_document_parse_done(auth, _kb_id, _document_ids)


validate_document_parse_cancel(auth, _kb_id, _document_ids)


Classes and Tests

TestAuthorization


TestDocumentsParse


test_parse_100_files


test_concurrent_parse


TestDocumentsParseStop


Important Implementation Details


Interaction with Other Parts of the System

This file serves as a critical integration test suite validating the document parsing subsystem's API correctness, error handling, and concurrency behavior.


Usage Examples

Example of triggering a document parse and validating completion in a test:

kb_id, document_ids = add_documents_func  # fixture that adds documents
res = parse_documents(WebApiAuth, {"doc_ids": document_ids, "run": "1"})
assert res["code"] == 0

# Wait until parsing is done
condition(WebApiAuth, kb_id, document_ids)

# Validate parsing results
validate_document_parse_done(WebApiAuth, kb_id, document_ids)

Example of testing invalid authorization:

res = parse_documents(None)
assert res["code"] == 401
assert "<Unauthorized" in res["message"]

Mermaid Diagram: Class and Function Structure

classDiagram
    class TestAuthorization {
        +test_invalid_auth(invalid_auth, expected_code, expected_message)
    }
    class TestDocumentsParse {
        +test_basic_scenarios(WebApiAuth, add_documents_func, payload, expected_code, expected_message)
        +test_parse_partial_invalid_document_id(WebApiAuth, add_documents_func, payload)
        +test_repeated_parse(WebApiAuth, add_documents_func)
        +test_duplicate_parse(WebApiAuth, add_documents_func)
    }
    class TestDocumentsParseStop {
        +test_basic_scenarios(WebApiAuth, add_documents_func, payload, expected_code, expected_message)
        +test_stop_parse_partial_invalid_document_id(WebApiAuth, add_documents_func, payload)
    }
    class Functions {
        +condition(_auth, _kb_id, _document_ids)
        +validate_document_parse_done(auth, _kb_id, _document_ids)
        +validate_document_parse_cancel(auth, _kb_id, _document_ids)
        +test_parse_100_files(WebApiAuth, add_dataset_func, tmp_path)
        +test_concurrent_parse(WebApiAuth, add_dataset_func, tmp_path)
    }
    TestAuthorization ..> Functions : uses
    TestDocumentsParse ..> Functions : uses
    TestDocumentsParseStop ..> Functions : uses

Summary

test_paser_documents.py is a comprehensive pytest-based test suite that validates document parsing operations in the InfiniFlow platform. It covers authorization, normal and edge case parsing scenarios, stopping parsing, concurrency, and batch processing. The file uses helper functions to assert parsing state and progress, and leverages concurrency utilities to simulate parallel requests. It acts as a crucial quality assurance layer ensuring the document parsing API meets functional and security requirements.