test_upload_documents.py
Overview
test_upload_documents.py is a comprehensive test suite designed to validate the document upload functionality of the InfiniFlow system, specifically focusing on API interactions related to uploading documents to datasets. The tests cover authorization checks, file type validations, file naming constraints, dataset ID verification, handling of duplicates, large batch uploads, and concurrency.
The file uses the pytest framework and integrates with common utility functions and fixtures to simulate real-world upload scenarios. It ensures the robustness, security, and correctness of the document upload API endpoints.
Detailed Breakdown
Imports and Dependencies
string: Used for string manipulation and character translation.ThreadPoolExecutor: For concurrent execution of upload operations.pytest: Testing framework used for structuring tests.
requests: For making HTTP requests in some test scenarios.Imported constants and functions from
common:DOCUMENT_NAME_LIMITHOST_ADDRESSINVALID_API_TOKENupload_documnets(note: likely a typo in the function name, assumed intentional for backward compatibility)
RAGFlowHttpApiAuth from
libs.auth: Custom authentication class for API requests.create_txt_filefromlibs.utils.file_utils: Utility to generate text files for testing.MultipartEncoder from
requests_toolbelt: For multipart form encoding in HTTP requests.
Classes and Tests
Class: TestAuthorization
Purpose: Tests authorization scenarios for the document upload API.
Markers: @pytest.mark.p1 (priority 1), uses fixture clear_datasets to reset state.
Tests:
test_invalid_auth(auth, expected_code, expected_message)Tests uploading documents with invalid or missing authorization.
Parameters:
auth: Auth object or None.expected_code: Expected response code.expected_message: Expected error message.
Behavior:
Calls
upload_documnetswith givenauth.Asserts the response contains the expected error code and message.
Test Cases:
No authorization provided → error code
0, message "Authorizationcan't be empty".Invalid API token → error code
109, message indicating authentication error.
Class: TestDocumentsUpload
Purpose: Tests various scenarios of uploading documents to datasets.
Markers: Mixed priority markers (
p1,p2,p3) to denote test criticality.Tests:
test_valid_single_upload(get_http_api_auth, add_dataset_func, tmp_path)Uploads a single valid text file.
Asserts success code
0and verifies dataset ID and file name in response.
test_file_type_validation(get_http_api_auth, add_dataset_func, generate_test_files, request)Parametrized test for supported file types: docx, excel, ppt, image, pdf, txt, md, json, eml, html.
Uploads each file type and asserts successful upload.
test_unsupported_file_type(get_http_api_auth, add_dataset_func, tmp_path, file_type)Parametrized test for unsupported file types: exe, unknown.
Expects error code
500and a message indicating unsupported file type.
test_missing_file(get_http_api_auth, add_dataset_func)Attempts upload without files.
Expects error code
101and message "No file part!".
test_empty_file(get_http_api_auth, add_dataset_func, tmp_path)Uploads an empty file.
Asserts success with file size reported as 0.
test_filename_empty(get_http_api_auth, add_dataset_func, tmp_path)Uploads a file with an empty filename using a raw HTTP request.
Expects error code
101and message "No file selected!".
test_filename_exceeds_max_length(get_http_api_auth, add_dataset_func, tmp_path)Uploads a file with a filename length near the limit (
DOCUMENT_NAME_LIMIT - 3).Expects error code
101and message about filename length constraints.
test_invalid_dataset_id(get_http_api_auth, tmp_path)Attempts upload with an invalid dataset ID.
Expects error code
100and message indicating dataset not found.
test_duplicate_files(get_http_api_auth, add_dataset_func, tmp_path)Uploads the same file twice in one request.
Asserts both files are accepted and names are properly suffixed (e.g.,
(1)appended).
test_same_file_repeat(get_http_api_auth, add_dataset_func, tmp_path)Uploads the same file repeatedly (10 times).
Ensures each upload is successful and filenames are suffixed appropriately to avoid conflicts.
test_filename_special_characters(get_http_api_auth, add_dataset_func, tmp_path)Tests filenames containing special characters replaced with underscores.
Asserts successful upload and filename safety.
test_multiple_files(get_http_api_auth, add_dataset_func, tmp_path)Uploads 20 files in a batch.
Asserts success and verifies dataset's document count matches the number of uploaded files.
test_concurrent_upload(get_http_api_auth, add_dataset_func, tmp_path)Tests concurrent uploads of 20 files using a thread pool with 5 workers.
Verifies all uploads succeed and dataset document count is correct afterwards.
Important Implementation Details
The tests rely on helper functions and fixtures such as
upload_documnets(likely a wrapper for the document upload API), list_datasets (for querying dataset state), andcreate_txt_file(to generate test files).File naming collision resolution is tested by appending (index) to the filename for duplicates.
File type validation is enforced server-side, and unsupported types return error code
500.Authorization errors are explicitly checked for missing or invalid tokens.
Concurrent uploads are tested to verify thread-safety and server consistency.
Multipart form-data encoding is used in some raw HTTP request tests to simulate real upload requests directly.
Interaction with Other Components
Common module: Provides constants and utility functions such as
upload_documnetsand list_datasets.libs.auth: Provides authentication classes used to generate valid or invalid API tokens.
libs.utils.file_utils: Supplies file creation utilities for test file generation.
File upload API: The tested API endpoint is defined by FILE_API_URL and hosted on
HOST_ADDRESS.Datasets: The tests interact with dataset entities, verifying document counts and dataset identification.
Usage Examples
Example: Uploading a Single Valid File
def test_valid_single_upload(get_http_api_auth, add_dataset_func, tmp_path):
dataset_id = add_dataset_func
fp = create_txt_file(tmp_path / "ragflow_test.txt")
res = upload_documnets(get_http_api_auth, dataset_id, [fp])
assert res["code"] == 0
assert res["data"][0]["dataset_id"] == dataset_id
assert res["data"][0]["name"] == fp.name
This example demonstrates how to create a text file, upload it to a dataset using authenticated API calls, and assert the expected success response.
Mermaid Class Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(auth, expected_code, expected_message)
}
class TestDocumentsUpload {
+test_valid_single_upload(get_http_api_auth, add_dataset_func, tmp_path)
+test_file_type_validation(get_http_api_auth, add_dataset_func, generate_test_files, request)
+test_unsupported_file_type(get_http_api_auth, add_dataset_func, tmp_path, file_type)
+test_missing_file(get_http_api_auth, add_dataset_func)
+test_empty_file(get_http_api_auth, add_dataset_func, tmp_path)
+test_filename_empty(get_http_api_auth, add_dataset_func, tmp_path)
+test_filename_exceeds_max_length(get_http_api_auth, add_dataset_func, tmp_path)
+test_invalid_dataset_id(get_http_api_auth, tmp_path)
+test_duplicate_files(get_http_api_auth, add_dataset_func, tmp_path)
+test_same_file_repeat(get_http_api_auth, add_dataset_func, tmp_path)
+test_filename_special_characters(get_http_api_auth, add_dataset_func, tmp_path)
+test_multiple_files(get_http_api_auth, add_dataset_func, tmp_path)
+test_concurrent_upload(get_http_api_auth, add_dataset_func, tmp_path)
}
TestAuthorization <|-- TestDocumentsUpload
Summary
test_upload_documents.py is a robust and well-structured test suite that ensures the integrity and correctness of the document upload feature in the InfiniFlow project. By covering authorization, file validation, error conditions, concurrency, and edge cases, it provides high confidence in the upload API's behavior under various conditions. Integration with shared utilities and adherence to pytest conventions make it maintainable and extensible for future enhancements.