test_upload_documents.py
Overview
test_upload_documents.py is a test suite designed to verify the correctness, robustness, and expected functionality of document upload features within the InfiniFlow system. Using the pytest framework, it tests various scenarios for uploading documents to datasets, ensuring that file types, naming conventions, concurrency, and edge cases are handled correctly. The tests focus on validating the upload process, file type support, filename constraints, handling of duplicates, and system behavior under concurrent uploads.
This file primarily interacts with:
The
datasetobject (likely part of the InfiniFlow application) which exposes the methodupload_documents.Utilities such as
create_txt_filefrom utils.file_utils to generate test files.Configuration constants like
DOCUMENT_NAME_LIMIT.The
clientinterface to query or verify dataset states post-upload.
Classes and Methods
Class: TestDocumentsUpload
This class encapsulates multiple test methods focused on the document upload functionality to datasets. It uses pytest's fixtures and markers to organize the tests by priority and parameterization.
Method: test_valid_single_upload(self, add_dataset_func, tmp_path)
Purpose: Tests uploading a single valid text file document to a dataset.
Parameters:
add_dataset_func: pytest fixture providing a dataset instance.tmp_path: pytest fixture providing a temporary directory path for file creation.
Returns: None. Asserts that the uploaded document's dataset ID matches and the filename is preserved.
Usage:
def test_valid_single_upload(self, add_dataset_func, tmp_path): dataset = add_dataset_func fp = create_txt_file(tmp_path / "ragflow_test.txt") with fp.open("rb") as f: blob = f.read() documents = dataset.upload_documents([{"display_name": fp.name, "blob": blob}]) for document in documents: assert document.dataset_id == dataset.id assert document.name == fp.name
Method: test_file_type_validation(self, add_dataset_func, generate_test_files, request)
Purpose: Parameterized test that checks uploading supported file types (docx, excel, ppt, image, pdf, txt, md, json, eml, html).
Parameters:
add_dataset_func: fixture for dataset.generate_test_files: fixture to generate test files of varying types.request: pytest's request object to retrieve parameter value.
Returns: None. Asserts dataset consistency and file name correctness.
Usage: Called multiple times automatically by pytest with different file types.
Method: test_unsupported_file_type(self, add_dataset_func, tmp_path, file_type)
Purpose: Tests that uploading unsupported file types (
exe,unknown) raises a proper exception.Parameters:
add_dataset_func: fixture for dataset.tmp_path: temporary directory path.file_type: parameterized file type string.
Returns: None. Asserts that the exception message matches expected unsupported file type warning.
Example Exception Message:
"ragflow_test.exe: This type of file has not been supported yet!"
Method: test_missing_file(self, add_dataset_func)
Purpose: Tests that uploading with no files raises an exception indicating missing file parts.
Returns: None. Checks for
"No file part!"exception.
Method: test_empty_file(self, add_dataset_func, tmp_path)
Purpose: Tests uploading an empty file and verifies the document size is zero.
Returns: None.
Method: test_filename_empty(self, add_dataset_func, tmp_path)
Purpose: Tests that uploading a file with an empty filename raises an exception.
Returns: None. Checks for
"No file selected!"exception.
Method: test_filename_max_length(self, add_dataset_func, tmp_path)
Purpose: Tests uploading a file with a filename at the maximum allowed length (
DOCUMENT_NAME_LIMIT).Returns: None. Verifies document dataset ID and filename correctness.
Method: test_duplicate_files(self, add_dataset_func, tmp_path)
Purpose: Tests uploading two files with the same name simultaneously and ensures renaming for duplicates (e.g.,
file.txtandfile(1).txt).Returns: None.
Method: test_same_file_repeat(self, add_dataset_func, tmp_path)
Purpose: Tests uploading the same file multiple times sequentially, verifying incremental renaming.
Returns: None.
Method: test_filename_special_characters(self, add_dataset_func, tmp_path)
Purpose: Tests uploading a file with special characters in the filename, which are translated to safe characters like underscores.
Returns: None.
Method: test_multiple_files(self, client, add_dataset_func, tmp_path)
Purpose: Tests uploading multiple files (20 in this case) in a single batch upload.
Parameters:
client: interface to query dataset state.
Returns: None. Verifies number of uploaded documents matches expected and dataset document count is updated.
Method: test_concurrent_upload(self, client, add_dataset_func, tmp_path)
Purpose: Tests uploading multiple files concurrently using a ThreadPoolExecutor with 5 workers to simulate parallel uploads.
Returns: None. Asserts that all uploads complete and dataset document count matches the number of files uploaded.
Implementation detail: Uses Python's
concurrent.futures.ThreadPoolExecutorandas_completedto manage concurrency.
Important Implementation Details and Algorithms
Filename Handling for Duplicates: When uploading files with duplicate names, the system automatically renames subsequent files by appending an index in parentheses (e.g.,
file(1).txt). This is verified in teststest_duplicate_filesandtest_same_file_repeat.File Type Validation: The tests confirm that only certain file types are allowed to be uploaded; unsupported types raise exceptions with specific error messages.
Special Characters in Filenames: Illegal characters in filenames are replaced with underscores to ensure safe storage and retrieval.
Concurrency: The
test_concurrent_uploadsimulates multiple simultaneous uploads to ensure thread safety and proper document handling under parallel conditions.Use of Fixtures: The tests heavily rely on pytest fixtures such as
add_dataset_func,tmp_path,generate_test_files, andclientto abstract setup and provide reusable components for test isolation.
Interaction with Other System Components
dataset.upload_documents(): The central method under test, responsible for accepting a list of dictionaries withdisplay_nameand binaryblobkeys and returning the uploaded document objects.create_txt_fileutility: Used to generate temporary text files for testing uploads.DOCUMENT_NAME_LIMITconstant: Governs the maximum allowed length for document filenames.client.get_dataset(): Used to retrieve dataset metadata, verifying document counts after uploads.Pytest fixtures: Provide datasets, temporary file paths, and test files of various types.
This file is part of the automated testing suite ensuring the integrity of document upload features in the InfiniFlow system.
Visual Diagram
classDiagram
class TestDocumentsUpload {
<<test class>>
+test_valid_single_upload(add_dataset_func, tmp_path)
+test_file_type_validation(add_dataset_func, generate_test_files, request)
+test_unsupported_file_type(add_dataset_func, tmp_path, file_type)
+test_missing_file(add_dataset_func)
+test_empty_file(add_dataset_func, tmp_path)
+test_filename_empty(add_dataset_func, tmp_path)
+test_filename_max_length(add_dataset_func, tmp_path)
+test_duplicate_files(add_dataset_func, tmp_path)
+test_same_file_repeat(add_dataset_func, tmp_path)
+test_filename_special_characters(add_dataset_func, tmp_path)
+test_multiple_files(client, add_dataset_func, tmp_path)
+test_concurrent_upload(client, add_dataset_func, tmp_path)
}
class Dataset {
+upload_documents(files: List[Dict]) Document[]
+id
+name
+document_count
}
class Client {
+get_dataset(name: str) Dataset
}
TestDocumentsUpload --> Dataset : uses
TestDocumentsUpload --> Client : uses (for verification)
Summary
This test file is comprehensive and covers the breadth of document upload functionality, from happy path scenarios to edge cases and concurrency. It ensures that the dataset's document upload mechanism behaves correctly with various file types, file sizes, naming conventions, and under parallel execution. The use of pytest features like parameterization, markers, and fixtures facilitates maintainable and scalable test coverage.