test_list_chunks.py

Overview

test_list_chunks.py is a comprehensive test suite designed to validate the behavior and robustness of the list_chunks API function within the InfiniFlow system. This file uses the pytest framework to run a variety of test scenarios that cover:

Authorization validation and error handling.
Pagination parameters (page, page_size) and their edge cases.
Filtering functionality using keywords and chunk IDs.
Handling of invalid parameters, dataset IDs, and document IDs.
Concurrent access and thread safety of the list_chunks endpoint.
Interaction with chunk addition (batch_add_chunks) to verify updates.

The tests ensure that the list_chunks function correctly handles valid and invalid inputs, maintains expected behavior under concurrent requests, and properly enforces permissions and ownership rules.

Classes and Test Cases

TestAuthorization

This class contains tests related to authorization and authentication for accessing the list_chunks API.

Method: `test_invalid_auth`

Purpose: Validates that the API rejects requests with invalid or missing authorization tokens.
Parameters:
- invalid_auth: An invalid or None authorization object.
- expected_code: Expected error code from the API response.
- expected_message: Expected error message describing the authorization failure.
Usage Example:

def test_invalid_auth(self, invalid_auth, expected_code, expected_message):
    res = list_chunks(invalid_auth, "dataset_id", "document_id")
    assert res["code"] == expected_code
    assert res["message"] == expected_message

Behavior:
- Tests with None authorization should return code 0 and message "Authorization can't be empty".
- Tests with an invalid API token should return code 109 and an appropriate authentication error message.

TestChunksList

This class groups tests for the core functionality of listing chunks, including pagination, filtering, concurrency, and error scenarios.

Method: `test_page`

Purpose: Tests the handling of the page parameter in pagination.
Parameters:
- params: Dictionary containing page and page_size.
- expected_code: Expected API response code.
- expected_page_size: Expected number of chunks returned.
- expected_message: Expected message if any error occurs.
Details:
- Valid pages should return chunks accordingly.
- Negative pages and invalid strings are skipped due to known issues or unsupported behavior.

Method: `test_page_size`

Purpose: Tests different values for the page_size parameter.
Parameters: Same as test_page but focusing on page_size.
Details:
- Page sizes of 0 or greater than max (5) default to 5.
- Invalid values like negative or non-numeric strings are skipped or expected to raise errors.

Method: `test_keywords`

Purpose: Checks filtering chunks by keywords.
Parameters:
- params: Contains the keywords filter.
- expected_page_size: Expected number of chunks matching the keyword filter.
Details:
- Tests various keyword inputs including empty, None, numeric strings, and known keywords.
- Some cases are skipped when running under specific document engines due to known issues.

Method: `test_id`

Purpose: Tests filtering by specific chunk ID.
Parameters:
- chunk_id: The chunk ID to filter by; can be None, empty, callable, or unknown string.
- expected_code: Expected API response code.
- expected_page_size: Expected number of chunks.
- expected_message: Expected error message if any.
Details:
- Valid chunk IDs return exactly one chunk.
- Unknown or invalid IDs return errors or empty results.

Method: `test_invalid_params`

Purpose: Verifies that unknown query parameters do not affect the successful response.
Details:
- Passing an unknown parameter returns default chunk list without errors.

Method: `test_concurrent_list`

Purpose: Stress test for concurrency by sending 100 parallel requests to list_chunks.
Details:
- Uses ThreadPoolExecutor with five workers.
- Verifies all responses return the expected number of chunks.

Method: `test_default`

Purpose: Validates default behavior of list_chunks before and after batch adding chunks.
Details:
- Checks initial chunk count.
- Adds 31 chunks using batch_add_chunks.
- Waits 3 seconds (to allow processing).
- Verifies that chunk count increased by 31 and only 30 chunks are returned by default pagination.

Method: `test_invalid_dataset_id`

Purpose: Tests API response for invalid or unauthorized dataset IDs.
Parameters:
- dataset_id: Dataset identifier string.
- expected_code: Expected response code.
- expected_message: Expected error message.
Details:
- Empty or invalid dataset IDs return appropriate errors.

Method: `test_invalid_document_id`

Purpose: Tests API response for invalid or unauthorized document IDs.
Parameters:
- document_id: Document identifier string.
- expected_code: Expected response code.
- expected_message: Expected error message.
Details:
- Empty or invalid document IDs return appropriate errors.

Important Implementation Details

Use of pytest.mark.parametrize:
Many tests use parameterization to cover multiple input scenarios efficiently.
Skipping Tests:
Some tests are skipped conditionally because of known issues or unsupported features, indicated with pytest.mark.skip or pytest.mark.skipif.
Concurrent Testing:
test_concurrent_list uses Python's ThreadPoolExecutor to simulate multiple simultaneous requests to check thread safety and performance.
Sleep for eventual consistency:
test_default introduces a sleep(3) delay after adding chunks to allow eventual consistency or background processing to complete before verification.
Error Handling Validation:
Tests validate that error codes and messages match expected values for various invalid inputs, ensuring the API's robustness.

Interaction with Other Modules

Imports:
- batch_add_chunks and list_chunks come from a common utilities module and represent core API operations tested here.
- RAGFlowHttpApiAuth from libs.auth is used to create authorization tokens.
- INVALID_API_TOKEN from configs is used to test invalid authentication.
Fixtures and Helpers:
- HttpApiAuth, add_chunks, and add_document are pytest fixtures assumed to provide valid authentication and pre-populated datasets/documents/chunks for testing.
API Under Test:
- list_chunks(auth, dataset_id, document_id, params=None) is the main function under test, which returns a dictionary containing status codes, messages, and chunk data.

Usage Examples

Example usage of list_chunks in tests:

def test_page_size(self, HttpApiAuth, add_chunks):
    dataset_id, document_id, _ = add_chunks
    params = {"page_size": 2}
    res = list_chunks(HttpApiAuth, dataset_id, document_id, params=params)
    assert res["code"] == 0
    assert len(res["data"]["chunks"]) == 2

Example of concurrent invocation:

def test_concurrent_list(self, HttpApiAuth, add_chunks):
    dataset_id, document_id, _ = add_chunks
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(list_chunks, HttpApiAuth, dataset_id, document_id) for _ in range(100)]
    responses = list(as_completed(futures))
    assert len(responses) == 100
    assert all(future.result()["code"] == 0 for future in futures)

Mermaid Diagram: Test Class Structure

classDiagram
    class TestAuthorization {
        +test_invalid_auth(invalid_auth, expected_code, expected_message)
    }
    class TestChunksList {
        +test_page(params, expected_code, expected_page_size, expected_message)
        +test_page_size(params, expected_code, expected_page_size, expected_message)
        +test_keywords(params, expected_page_size)
        +test_id(chunk_id, expected_code, expected_page_size, expected_message)
        +test_invalid_params()
        +test_concurrent_list()
        +test_default()
        +test_invalid_dataset_id(dataset_id, expected_code, expected_message)
        +test_invalid_document_id(document_id, expected_code, expected_message)
    }

Summary

test_list_chunks.py serves as a critical quality assurance asset for the InfiniFlow platform, ensuring the chunk listing functionality behaves correctly across a wide array of scenarios, including authorization, pagination, filtering, concurrency, and error situations. It leverages pytest's advanced features like parameterization and fixtures to keep tests organized, efficient, and maintainable. The tests also provide early warnings for integration issues, such as known problems with specific document engines or invalid inputs.

This file interacts heavily with authentication modules and chunk management utilities, embodying an integration test layer between API endpoints and data services. It is essential for maintaining the reliability and security of the chunk listing feature in the InfiniFlow system.

test_list_chunks.py

Overview

Classes and Test Cases

TestAuthorization

Method: test_invalid_auth

TestChunksList

Method: test_page

Method: test_page_size

Method: test_keywords

Method: test_id

Method: test_invalid_params

Method: test_concurrent_list

Method: test_default

Method: test_invalid_dataset_id

Method: test_invalid_document_id