test_upload_documents.py
Overview
test_upload_documents.py is a test suite designed to validate the functionality, robustness, and security of the document upload feature within the InfiniFlow system. Using the pytest framework, it systematically tests various scenarios related to uploading documents to knowledge bases (KBs) via a web API.
The file contains tests for:
Authorization and authentication handling,
Valid and invalid file uploads,
File type and size validations,
Filename constraints and special characters,
Handling of multiple and concurrent file uploads,
Error conditions such as missing files or invalid knowledge base IDs.
This suite ensures the document upload endpoint behaves correctly under diverse conditions and meets the expected API contract.
Detailed Explanation of Classes and Methods
Class: TestAuthorization
This class focuses on verifying that the API properly handles authorization failures.
Method: test_invalid_auth(invalid_auth, expected_code, expected_message)
Purpose: Tests the API response when invalid or missing authentication credentials are provided.
Parameters:
invalid_auth: An authentication object orNoneto simulate no auth.expected_code: The expected HTTP or API error code (e.g., 401).expected_message: The expected error message string.
Returns: None, but asserts that the API response matches expected failure codes.
Usage: Parametrized to test both
Noneand invalid token scenarios.Example:
test_invalid_auth(None, 401, "<Unauthorized '401: Unauthorized'>") test_invalid_auth(RAGFlowWebApiAuth(INVALID_API_TOKEN), 401, "<Unauthorized '401: Unauthorized'>")Implementation details: Calls
upload_documentswith invalid auth and asserts the response error matches expectations.
Class: TestDocumentsUpload
This class contains multiple test methods to validate various document upload scenarios.
Method: test_valid_single_upload(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading a single valid document.
Parameters:
WebApiAuth: Valid authentication object.add_dataset_func: Fixture to create or get a knowledge base ID.tmp_path: Temporary filesystem path for creating test files.
Returns: None; asserts successful upload and correct metadata.
Usage: Creates a simple
.txtfile and uploads it.Example:
test_valid_single_upload(WebApiAuth, add_dataset_func, tmp_path)Implementation: Uses helper
create_txt_fileto generate file, then callsupload_documents.
Method: test_file_type_validation(WebApiAuth, add_dataset_func, generate_test_files, request)
Purpose: Tests uploading supported file types.
Parameters:
WebApiAuth,add_dataset_func: as above.generate_test_files: Fixture that generates various file types (docx, pdf, image, etc.).request: Pytest request object to get current param.
Returns: None; asserts upload success for each supported file type.
Usage: Parametrized over multiple file types.
Implementation: Uploads one file per supported type and validates response.
Method: test_unsupported_file_type(WebApiAuth, add_dataset_func, tmp_path, file_type)
Purpose: Tests API response when unsupported file types are uploaded.
Parameters:
file_type: e.g., "exe", "unknown".
Returns: None; asserts an error response with code 500 and appropriate message.
Implementation: Creates an empty file with the unsupported extension and uploads it.
Method: test_missing_file(WebApiAuth, add_dataset_func)
Purpose: Tests behavior when no file is provided in the upload request.
Returns: None; expects error code 101 with message "No file part!".
Method: test_empty_file(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading an empty file.
Returns: None; expects success but with file size reported as 0.
Method: test_filename_empty(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests API response when file is uploaded with an empty filename.
Returns: None; expects error code 101 and message "No file selected!".
Implementation: Uses
MultipartEncoderto craft a request with an empty filename.
Method: test_filename_exceeds_max_length(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading a file with a filename at the maximum allowed length.
Returns: None; expects success and verifies filename is preserved.
Method: test_invalid_kb_id(WebApiAuth, tmp_path)
Purpose: Tests upload with an invalid knowledge base ID.
Returns: None; expects error code 100 with relevant lookup error message.
Method: test_duplicate_files(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading duplicate files in one request.
Returns: None; expects successful upload with unique filenames generated for duplicates (e.g.,
file.txt,file(1).txt).
Method: test_filename_special_characters(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading a file with special characters in the filename.
Returns: None; expects success and filename sanitized/replaced for illegal characters.
Implementation: Illegal filename chars are replaced with underscores
_.
Method: test_multiple_files(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests uploading multiple files in a single request (e.g., 20 files).
Returns: None; expects all files uploaded successfully and knowledge base document count updated accordingly.
Method: test_concurrent_upload(WebApiAuth, add_dataset_func, tmp_path)
Purpose: Tests concurrent uploads using multiple threads to simulate parallel requests.
Returns: None; expects all uploads to succeed and document count to match total uploads.
Implementation: Uses
ThreadPoolExecutorwith 5 workers to upload 20 files concurrently.
Important Implementation Details and Algorithms
Filename Sanitization: Special characters in filenames are replaced with underscores to avoid issues with file handling.
Duplicate Filename Handling: When duplicate files are uploaded, the system appends
(1),(2), etc., to filenames to ensure uniqueness.File Type Validation: Only certain file types are accepted; unsupported types return a 500 error with descriptive messages.
Concurrent Uploads: The test suite verifies thread-safety and data consistency when multiple uploads happen simultaneously.
Interaction with Other System Components
upload_documentsfunction: Central utility function fromcommonmodule used to perform the actual upload API call. This abstracts the HTTP request details.Authentication: Uses
RAGFlowWebApiAuthfromlibs.authto provide token-based authentication to the API.Temporary Files: Uses
create_txt_filefromutils.file_utilsto generate temporary test files.Configuration Constants: Such as
DOCUMENT_NAME_LIMIT,HOST_ADDRESS,INVALID_API_TOKENimported fromconfigs.API Endpoints: Interacts with the document upload endpoint defined by
DOCUMENT_APP_URLandHOST_ADDRESS.Fixtures: Pytest fixtures like
add_dataset_func,WebApiAuth,tmp_path,generate_test_filesprovide setup of knowledge bases, auth, temp files, and test files.Document Listing: Uses
list_kbsfromcommonto verify the state of knowledge bases after uploads.
Visual Diagram
classDiagram
class TestAuthorization {
+test_invalid_auth(invalid_auth, expected_code, expected_message)
}
class TestDocumentsUpload {
+test_valid_single_upload(WebApiAuth, add_dataset_func, tmp_path)
+test_file_type_validation(WebApiAuth, add_dataset_func, generate_test_files, request)
+test_unsupported_file_type(WebApiAuth, add_dataset_func, tmp_path, file_type)
+test_missing_file(WebApiAuth, add_dataset_func)
+test_empty_file(WebApiAuth, add_dataset_func, tmp_path)
+test_filename_empty(WebApiAuth, add_dataset_func, tmp_path)
+test_filename_exceeds_max_length(WebApiAuth, add_dataset_func, tmp_path)
+test_invalid_kb_id(WebApiAuth, tmp_path)
+test_duplicate_files(WebApiAuth, add_dataset_func, tmp_path)
+test_filename_special_characters(WebApiAuth, add_dataset_func, tmp_path)
+test_multiple_files(WebApiAuth, add_dataset_func, tmp_path)
+test_concurrent_upload(WebApiAuth, add_dataset_func, tmp_path)
}
TestAuthorization ..> upload_documents : uses
TestDocumentsUpload ..> upload_documents : uses
TestDocumentsUpload ..> create_txt_file : uses
TestDocumentsUpload ..> list_kbs : uses
TestAuthorization ..> RAGFlowWebApiAuth : uses
TestDocumentsUpload ..> RAGFlowWebApiAuth : uses
Summary
test_upload_documents.py is a pytest-based automated test suite.
It rigorously tests document upload APIs with a focus on authorization, file validation, naming constraints, and concurrency.
The tests use helper utilities and fixtures to abstract file creation and authentication.
It validates both expected successes and failure cases with precise assertions.
The suite ensures the document upload feature is stable, secure, and conforms to system requirements before deployment.