test_upload_documents.py
Overview
test_upload_documents.py is a comprehensive test suite designed to verify the functionality, robustness, and security of the document upload feature in the InfiniFlow system. Using the pytest framework, it validates various scenarios including authorization handling, file type and name validations, handling of duplicate and concurrent uploads, and boundary conditions such as empty files or missing files.
The file ensures that the document upload API behaves correctly according to functional requirements and error handling expectations. It also helps maintain the integrity and reliability of the document upload subsystem by catching regressions or defects early in the development cycle.
Detailed Description of Classes and Functions
Imports and Dependencies
Standard and third-party libraries:
string: Used for string manipulation.concurrent.futures.ThreadPoolExecutor,as_completed: Used to test concurrent uploads.pytest: Testing framework.requests: For making HTTP requests.requests_toolbelt.MultipartEncoder: To construct multipart file upload requests.
Project-specific imports:
common: ProvidesFILE_API_URL,list_datasets, andupload_documentshelper functions.configs: Provides constants likeDOCUMENT_NAME_LIMIT,HOST_ADDRESS, andINVALID_API_TOKEN.libs.auth.RAGFlowHttpApiAuth: Authentication class for API requests.utils.file_utils.create_txt_file: Utility to create temporary text files for testing.
Class: TestAuthorization
Tests related to authorization errors during document upload.
Method: test_invalid_auth
Purpose: Validates that the upload API properly rejects unauthorized requests.
Parameters:
invalid_auth: An invalid or missing authentication object.expected_code: Expected error code returned by the API.expected_message: Expected error message describing the authorization failure.
Behavior: Calls
upload_documentswith invalid or missing auth and asserts the returned error code and message match expectations.Usage Example:
res = upload_documents(None, "dataset_id")
assert res["code"] == 0
assert res["message"] == "`Authorization` can't be empty"
Class: TestDocumentsUpload
This class contains many methods testing different aspects of document upload including file validation, naming, concurrency, and dataset handling.
Method: test_valid_single_upload
Purpose: Tests uploading a single valid text file.
Parameters:
HttpApiAuth: Valid authentication object.add_dataset_func: Fixture to create a new dataset and return its ID.tmp_path: Temporary directory provided by pytest.
Behavior: Creates a text file, uploads it, and asserts success with correct dataset ID and filename.
Return: None (assertions inside the test).
Example Usage:
res = upload_documents(HttpApiAuth, dataset_id, [fp])
assert res["code"] == 0
assert res["data"][0]["dataset_id"] == dataset_id
assert res["data"][0]["name"] == fp.name
Method: test_file_type_validation
Purpose: Validates supported file types by uploading various document formats.
Parameters:
generate_test_files: Fixture generating test files of different formats (docx, excel, ppt, image, pdf, txt, md, json, eml, html).Other parameters similar to
test_valid_single_upload.
Behavior: Uploads one file of each supported type, asserting successful upload.
Example: Parameterized test iterates over file types, uploading and verifying each.
Method: test_unsupported_file_type
Purpose: Ensures that unsupported file types (e.g.,
.exe,.unknown) are rejected.Parameters: Similar to above.
Behavior: Creates empty files with unsupported extensions, uploads, and expects an error code 500 with a specific message.
Example Assertion:
assert res["message"] == f"ragflow_test.{file_type}: This type of file has not been supported yet!"
Method: test_missing_file
Purpose: Tests API response when no file is provided.
Behavior: Calls upload without files and expects error code 101 with "No file part!" message.
Method: test_empty_file
Purpose: Verifies that empty files can be uploaded and recognized with size 0.
Behavior: Creates an empty file, uploads it, asserts success and size = 0.
Method: test_filename_empty
Purpose: Tests uploading a file with an empty filename.
Behavior: Constructs a multipart upload with an empty filename, expects error 101 "No file selected!".
Method: test_filename_max_length
Purpose: Tests uploading a file with a filename at the maximum allowed length.
Behavior: Creates a file with a name length equal to
DOCUMENT_NAME_LIMIT, uploads, and asserts success.
Method: test_invalid_dataset_id
Purpose: Validates behavior when an invalid dataset ID is provided.
Behavior: Attempts upload with an invalid dataset ID, expects error code 100 and specific lookup error message.
Method: test_duplicate_files
Purpose: Tests uploading duplicate files within the same request.
Behavior: Uploads the same file twice, expects two entries with the second file renamed with a suffix
(1).
Method: test_same_file_repeat
Purpose: Tests uploading the same file multiple times sequentially.
Behavior: Uploads the same file three times, expecting the filename to be appended with
(1),(2)etc. on subsequent uploads.
Method: test_filename_special_characters
Purpose: Validates handling of filenames with special characters.
Behavior: Creates a filename replacing illegal characters with underscores, uploads, and asserts success.
Method: test_multiple_files
Purpose: Tests uploading multiple files in a single request.
Behavior: Creates and uploads 20 files, then verifies dataset document count reflects the upload.
Method: test_concurrent_upload
Purpose: Tests concurrent uploads to the same dataset.
Behavior: Uses a thread pool to upload 20 files concurrently, asserts all succeed, and verifies document count.
Important Implementation Details
The tests rely heavily on helper functions like
upload_documentsandlist_datasetsfor abstracting API interaction.Uploads are tested with both direct function calls (
upload_documents) and raw HTTP requests (usingrequestsandMultipartEncoder) to test edge cases such as empty filenames.File naming collision handling is tested by checking for auto-appended suffixes
(1),(2), etc.Concurrent uploads use Python's
ThreadPoolExecutorto simulate real-world multi-threaded upload scenarios.Parametrization in pytest is extensively used to cover various file types and edge cases efficiently.
The tests are tagged with
pytest.markdecorators to indicate priority (p1,p2,p3) and to use fixtures likeclear_datasetsfor test isolation.
Interaction with Other System Components
Authentication (
libs.auth.RAGFlowHttpApiAuth): Used to simulate authorized and unauthorized API requests.API URL and Configs (
common,configs): Provide endpoint URLs and configuration constants.Helper utilities (
utils.file_utils.create_txt_file): Used to generate temporary files for upload.Document Upload API (
upload_documents): The core API under test; responsible for storing documents in datasets.Dataset Listing API (
list_datasets): Used to verify the state of datasets after uploads.HTTP Requests: Some tests bypass helper functions to test low-level HTTP behaviors, ensuring coverage beyond the abstraction.
The tests collectively ensure that the upload API integrates correctly with dataset management, authentication, and file handling subsystems.
Visual Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestDocumentsUpload {
+test_valid_single_upload(HttpApiAuth, add_dataset_func, tmp_path)
+test_file_type_validation(HttpApiAuth, add_dataset_func, generate_test_files, request)
+test_unsupported_file_type(HttpApiAuth, add_dataset_func, tmp_path, file_type)
+test_missing_file(HttpApiAuth, add_dataset_func)
+test_empty_file(HttpApiAuth, add_dataset_func, tmp_path)
+test_filename_empty(HttpApiAuth, add_dataset_func, tmp_path)
+test_filename_max_length(HttpApiAuth, add_dataset_func, tmp_path)
+test_invalid_dataset_id(HttpApiAuth, tmp_path)
+test_duplicate_files(HttpApiAuth, add_dataset_func, tmp_path)
+test_same_file_repeat(HttpApiAuth, add_dataset_func, tmp_path)
+test_filename_special_characters(HttpApiAuth, add_dataset_func, tmp_path)
+test_multiple_files(HttpApiAuth, add_dataset_func, tmp_path)
+test_concurrent_upload(HttpApiAuth, add_dataset_func, tmp_path)
}
TestAuthorization --> upload_documents
TestDocumentsUpload --> upload_documents
TestDocumentsUpload --> list_datasets
TestDocumentsUpload --> create_txt_file
TestDocumentsUpload --> requests
Summary
test_upload_documents.py serves as an exhaustive test harness for the document upload functionality in InfiniFlow, covering:
Authorization validation
File type support and rejection
Filename edge cases (empty, max length, special characters)
Handling of missing and empty files
Duplicate file naming logic
Batch and concurrent uploads
Dataset integration and document counts
The suite leverages pytest's rich feature set for parameterization, fixtures, and marking to organize tests by priority and scenario. It uses both high-level helper functions and low-level HTTP requests to ensure thorough validation of the upload API's behavior under diverse conditions.
This module plays a critical role in maintaining the quality and reliability of document ingestion in the InfiniFlow system by automatically verifying that feature changes do not break expected behaviors.