test_download_document.py
Overview
test_download_document.py is a comprehensive test suite designed to validate the functionality and robustness of the document download feature in the InfiniFlow system. The file uses the pytest framework to test various scenarios, including authorization checks, file type validations, error handling for invalid dataset/document IDs, repeated downloads of the same file, and concurrent downloads.
The tests interact with core API functions such as download_document, upload_documents, and bulk_upload_documents to simulate real-world usage patterns and ensure data integrity, correct error reporting, and system stability under concurrent load.
Detailed Breakdown
Imports and Dependencies
json: For parsing JSON responses.
concurrent.futures (
ThreadPoolExecutor,as_completed): To perform concurrent downloads.pytest: Test framework used for defining tests and parametrizing them.
common: Module containing helper functions:
bulk_upload_documentsdownload_documentupload_documents
configs: Contains configuration constants like
INVALID_API_TOKEN.libs.auth.RAGFlowHttpApiAuth: Custom authentication class to simulate API authorization.
requests.codes: HTTP status codes.
utils.compare_by_hash: Utility to verify file integrity by comparing hashes.
Classes and Functions
Class: TestAuthorization
Tests authorization behavior for the document download API.
Method:
test_invalid_authTests the system's response when authorization is missing or invalid.
Parameters:
invalid_auth: Either None (no auth) or an invalid API token wrapped inRAGFlowHttpApiAuth.tmp_path: Temporary directory provided by pytest for saving test files.expected_code: Expected error code in the JSON response.expected_message: Expected error message string.
Behavior:
Calls
download_documentwith invalid or missing auth.Asserts the HTTP status is OK (indicating a valid HTTP transaction).
Loads the response JSON from the saved file.
Asserts that the returned error code and message match expected values.
Usage Example:
test_auth = RAGFlowHttpApiAuth("invalid_token") res = download_document(test_auth, "dataset_id", "document_id", "/tmp/test.txt") assert res.status_code == 200
Function: test_file_type_validation
Parametrized test that validates downloading various file types after upload.
Parameters:
HttpApiAuth: Valid authentication fixture.add_dataset: Fixture that provides a dataset ID.generate_test_files: Fixture that generates test files of various types (docx,excel,ppt,image,pdf,txt,md,json,eml,html).request: Pytestrequestobject to access parametrization data.
Behavior:
Uploads a single file of the specified type.
Downloads the same file.
Compares the uploaded and downloaded files by hash to ensure integrity.
Usage Example:
# Upload and download a PDF file, then compare hashes res = upload_documents(auth, dataset_id, [pdf_file]) doc_id = res["data"][0]["id"] res = download_document(auth, dataset_id, doc_id, "/tmp/downloaded.pdf") assert compare_by_hash(pdf_file, "/tmp/downloaded.pdf")
Class: TestDocumentDownload
Tests for various edge cases and validation related to document downloads.
Method:
test_invalid_document_idTests behavior when a non-existent or unauthorized document ID is used.
Parameters:
HttpApiAuth: Valid authentication.add_documents: Fixture returning a tuple(dataset_id, document_ids).tmp_path: Temporary directory.document_id: Invalid document ID to test.expected_code: Expected error code.expected_message: Expected error message.
Behavior:
Attempts to download with invalid document ID.
Checks response JSON for correct error reporting.
Method:
test_invalid_dataset_idTests behavior when an invalid or empty dataset ID is provided.
Parameters:
HttpApiAuth,add_documents,tmp_path,dataset_id,expected_code,expected_message.
Behavior:
Attempts download with invalid dataset ID.
Verifies response error codes/messages.
Method:
test_same_file_repeatDownloads the same file multiple times to ensure consistent output.
Parameters:
HttpApiAuth,add_documents,tmp_path,ragflow_tmp_dir.
Behavior:
Downloads the same document 5 times.
Validates that each downloaded file matches the originally uploaded file by hash.
Function: test_concurrent_download
Tests the system’s ability to handle multiple simultaneous document downloads.
Parameters:
HttpApiAuth: Valid authentication.add_dataset: Fixture for creating dataset.tmp_path: Temporary directory.
Behavior:
Uses
bulk_upload_documentsto upload 20 documents.Uses a thread pool to download all 20 documents concurrently with up to 5 worker threads.
Waits for all downloads to complete.
Checks that all downloaded files match their corresponding uploaded originals by hash.
Usage Example:
document_ids = bulk_upload_documents(auth, dataset_id, 20, tmp_path) with ThreadPoolExecutor(max_workers=5) as executor: futures = [executor.submit(download_document, auth, dataset_id, doc_id, tmp_path / f"file_{i}.txt") for i, doc_id in enumerate(document_ids)]
Important Implementation Details and Algorithms
Hash Comparison for File Integrity:
The utility functioncompare_by_hashis used extensively to ensure that files downloaded are bitwise identical to the files uploaded, confirming data integrity.Parametrized Testing:
Pytest’s@pytest.mark.parametrizeis used to run the same test logic against multiple input values (e.g., different file types, invalid auth scenarios) to maximize coverage with minimal redundancy.Concurrent Execution:
The use ofThreadPoolExecutorintest_concurrent_downloadsimulates real-world concurrent access patterns, ensuring the backend service can handle multiple parallel downloads reliably.Error Handling Verification:
Tests for invalid IDs and authorization simulate potential user errors or malicious requests, ensuring the API responds with meaningful error codes and messages rather than crashing or returning misleading information.
Interaction with Other System Components
commonModule:
Provides core functions such asdownload_document,upload_documents, andbulk_upload_documentswhich are API wrappers or helpers used to interact with the backend document storage and retrieval services.libs.auth.RAGFlowHttpApiAuth:
Represents API authentication tokens, used to simulate valid and invalid authorization scenarios.configs:
Supplies constants such asINVALID_API_TOKENfor testing authentication failures.utils.compare_by_hash:
Ensures that files before upload and after download are identical, verifying the integrity of the upload-download pipeline.Pytest Fixtures:
Fixtures likeHttpApiAuth,add_dataset,add_documents,generate_test_files, andtmp_pathare leveraged for setup and teardown, making tests isolated and repeatable.
This test file acts as a critical integration point verifying the correctness, security, and performance of the document download API in multiple realistic scenarios.
Diagram: Class and Function Structure
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, tmp_path, expected_code, expected_message)
}
class TestDocumentDownload {
+test_invalid_document_id(HttpApiAuth, add_documents, tmp_path, document_id, expected_code, expected_message)
+test_invalid_dataset_id(HttpApiAuth, add_documents, tmp_path, dataset_id, expected_code, expected_message)
+test_same_file_repeat(HttpApiAuth, add_documents, tmp_path, ragflow_tmp_dir)
}
class test_file_type_validation {
+test_file_type_validation(HttpApiAuth, add_dataset, generate_test_files, request)
}
class test_concurrent_download {
+test_concurrent_download(HttpApiAuth, add_dataset, tmp_path)
}
%% Relationships
TestAuthorization --> common : uses download_document
TestDocumentDownload --> common : uses download_document
test_file_type_validation --> common : uses upload_documents, download_document
test_concurrent_download --> common : uses bulk_upload_documents, download_document
TestAuthorization --> libs.auth.RAGFlowHttpApiAuth
test_file_type_validation --> utils.compare_by_hash
TestDocumentDownload --> utils.compare_by_hash
test_concurrent_download --> utils.compare_by_hash
Summary
This file is a pytest test suite focused on verifying document download functionality.
Covers authorization, file type support, error handling, consistency of repeated downloads, and concurrent downloads.
Uses fixtures and parametrization for broad and reusable testing.
Relies on hash comparison to ensure file integrity.
Interacts with authentication, document upload/download APIs, and utility modules.
Ensures the backend system is robust, secure, and performant under varying conditions.
This documentation should facilitate understanding, maintenance, and extension of the test suite for the InfiniFlow document download feature.