dataset.py

Overview

The dataset.py file implements a set of RESTful API endpoints for managing datasets within the InfiniFlow platform. It provides operations to create, delete, update, and list datasets, along with additional endpoints to manage and retrieve knowledge graph data associated with datasets.

These APIs are secured with token-based authentication and integrate tightly with underlying database services and search infrastructure to maintain dataset information and related documents.

Key functionalities include:


API Endpoint Functions

All route handlers are decorated with @token_required, enforcing authentication.
They utilize various services such as KnowledgebaseService, TenantService, DocumentService, and others for DB and domain logic.


1. create(tenant_id)

Route: POST /datasets
Purpose: Create a new dataset under the specified tenant.

Parameters:

Returns:

Implementation Details:

Example Usage:

POST /datasets
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "My Dataset",
  "embedding_model": "text-embedding-ada-002",
  "chunk_method": "naive"
}

2. delete(tenant_id)

Route: DELETE /datasets
Purpose: Delete one or multiple datasets for the tenant.

Parameters:

Returns:

Implementation Details:


3. update(tenant_id, dataset_id)

Route: PUT /datasets/<dataset_id>
Purpose: Update dataset properties.

Parameters:

Returns:

Implementation Details:


4. list_datasets(tenant_id)

Route: GET /datasets
Purpose: Retrieve a paginated list of datasets accessible by the tenant.

Query Parameters:

Returns:

Implementation Details:


5. knowledge_graph(tenant_id, dataset_id)

Route: GET /datasets/<dataset_id>/knowledge_graph
Purpose: Retrieve the knowledge graph and mind map data for a dataset.

Parameters:

Returns:

Implementation Details:


6. delete_knowledge_graph(tenant_id, dataset_id)

Route: DELETE /datasets/<dataset_id>/knowledge_graph
Purpose: Delete the knowledge graph data associated with a dataset.

Parameters:

Returns:

Implementation Details:


Important Implementation Details


Interaction with Other System Components


Visual Diagram - Class and Function Structure

Below is a Mermaid class diagram representing the main functions and their key relationships in this file. Since this file primarily defines route handler functions rather than classes, the diagram shows these as "classes" with their main responsibilities and usage of key services.

classDiagram
    class DatasetAPI {
        +create(tenant_id)
        +delete(tenant_id)
        +update(tenant_id, dataset_id)
        +list_datasets(tenant_id)
        +knowledge_graph(tenant_id, dataset_id)
        +delete_knowledge_graph(tenant_id, dataset_id)
    }

    class KnowledgebaseService {
        +get_or_none()
        +save()
        +get_by_id()
        +delete_by_id()
        +update_by_id()
        +accessible()
        +get_list()
    }

    class TenantService {
        +get_by_id()
        +get_joined_tenants_by_user_id()
    }

    class DocumentService {
        +query()
        +remove_document()
    }

    class File2DocumentService {
        +get_by_document_id()
        +delete_by_document_id()
    }

    class FileService {
        +filter_delete()
    }

    DatasetAPI ..> KnowledgebaseService : uses
    DatasetAPI ..> TenantService : uses
    DatasetAPI ..> DocumentService : uses
    DatasetAPI ..> File2DocumentService : uses
    DatasetAPI ..> FileService : uses

Summary

The dataset.py module serves as the API layer for managing datasets in the InfiniFlow application. It provides secure, validated, and permission-aware endpoints for dataset lifecycle operations and knowledge graph management. It abstracts complex database operations and integration with search infrastructure behind clean REST interfaces, making it a crucial component for dataset and knowledge management workflows within the system.