db_models.py


Overview

The db_models.py file is a core module of the InfiniFlow application responsible for defining the database schema models, database connection management, and utility classes and functions related to database operations. It uses the Peewee ORM framework to define models representing various entities such as users, tenants, knowledge bases, documents, dialogs, and more. In addition to model definitions, this file provides:

Overall, this module provides a robust and extensible foundation for managing persistent storage and data integrity in the InfiniFlow system.


Detailed Explanations

Utilities and Constants

singleton(cls, *args, **kw)

A decorator implementing the singleton pattern keyed by class and process ID to ensure only one instance is created per process.

Usage:

@singleton
class MyClass:
    pass

obj1 = MyClass()
obj2 = MyClass()
assert obj1 is obj2  # True

CONTINUOUS_FIELD_TYPE

A set of Peewee field types considered "continuous" such as integers, floats, and datetime fields. Used for filtering and querying ranges.


AUTO_DATE_TIMESTAMP_FIELD_PREFIX

A set of prefix strings for timestamp-related fields that should be handled automatically, e.g., "create", "update", "start".


Enum Classes

TextFieldType(Enum)

Defines database-specific text field types for LongTextField.


Custom Field Classes

LongTextField(TextField)

A subclass of Peewee's TextField that sets the field type according to the underlying database type (MySQL or Postgres) for storing large text data.


JSONField(LongTextField)

Field to store JSON-serializable Python objects as text in the database.


ListField(JSONField)

A specialization of JSONField with a default empty list [] instead of dict.


SerializedField(LongTextField)

Field to store serialized Python objects using either Pickle (base64 encoded) or JSON.


Helper Functions

is_continuous_field(cls: typing.Type) -> bool

Checks if a Peewee field class or its bases are considered continuous (integer, float, datetime).


auto_date_timestamp_field() -> set

Returns a set of timestamp field names with suffix _time based on AUTO_DATE_TIMESTAMP_FIELD_PREFIX.


auto_date_timestamp_db_field() -> set

Returns a set of timestamp field names with prefix f_ and suffix _time, matching the database field naming convention.


remove_field_name_prefix(field_name: str) -> str

Removes the prefix "f_" from a field name if present, used for converting database field names to human-readable keys.


Core Model Classes

BaseModel(Model)

Base class for all database models with:


JsonSerializedField(SerializedField)

A convenience subclass of SerializedField preset to use JSON serialization with a custom object hook (utils.from_dict_hook).


Database Connection & Pooling

RetryingPooledMySQLDatabase(PooledMySQLDatabase)

A MySQL database connection pool subclass with retry logic on lost connections (error codes 2013, 2006).


PooledDatabase(Enum)

Enum mapping database types to their pooled database classes:


DatabaseMigrator(Enum)

Enum mapping database types to their migrator classes:


BaseDataBase

A singleton class managing the database connection instance.

Usage:

db_instance = BaseDataBase()
db = db_instance.database_connection

Retry Decorator

with_retry(max_retries=3, retry_delay=1.0)

Decorator to add retry logic with exponential backoff to decorated functions, typically database operations.


Database Lock Classes

Provide advisory locking mechanisms for concurrent processes.

PostgresDatabaseLock


MysqlDatabaseLock


DatabaseLock(Enum)

Maps database types to their respective lock implementations:


Global Database Variables


Connection Management

close_connection()

Closes stale database connections older than 30 seconds to maintain pool health.


Data Model Classes

All models inherit DataBaseModel, which uses the shared DB connection.

User

Represents a system user with authentication fields, preferences, and status flags.


Tenant

Represents an organizational tenant with default model IDs and credit balance.


UserTenant

Associates users with tenants and roles.


InvitationCode

Stores invitation codes linked to users and tenants.


LLMFactories

Represents providers of Large Language Models (LLM).


LLM

Represents specific LLMs with factory ID, type, tags, and support flags.


TenantLLM

Tenant-specific LLM configuration including API keys and usage stats.


TenantLangfuse

Stores Langfuse API keys and host info per tenant.


Knowledgebase

Represents a knowledge base with metadata, configuration, and access permissions.


Document

Represents documents associated with knowledge bases, including processing status and metadata.


File

Represents files and folders with hierarchical parent-child relationships.


File2Document

Associates files to documents.


Task

Represents processing tasks for documents with progress, priority, and retry details.


Dialog

Represents dialog applications with language, LLM settings, prompts, filters, and status.


Conversation

Stores conversation messages and references related to dialogs.


APIToken

Stores API tokens linked to tenants and optionally dialogs.


API4Conversation

Tracks API calls for conversations with usage stats and error logging.


UserCanvas

Represents user-created canvases with permissions and DSL configuration.


CanvasTemplate

Templates for canvases with title, description, and configuration.


UserCanvasVersion

Versioned snapshots of user canvases.


MCPServer

Stores configuration for MCP (Model Control Plane) servers.


Search

Represents saved search configurations with filters and parameters.


Database Initialization and Migration

init_database_tables(alter_fields=[])


fill_db_model_object(model_object, human_model_dict)

Populates a model instance's attributes from a dictionary with keys matching model fields.


migrate_db()

Performs incremental schema migrations using playhouse.migrate.migrate and catches exceptions to continue.


Interaction with Other System Components


Important Implementation Details


Usage Examples

Querying with Filters and Ordering

# Get all users created between two timestamps
users = User.query(create_time=["2023-01-01 00:00:00", "2023-02-01 00:00:00"], reverse=True, order_by="create_time")

Using Database Lock as Context Manager

lock = DB.lock("my_lock_name")

with lock:
    # Critical section here
    do_database_updates()

Decorating a Method with Retry

@with_retry(max_retries=5, retry_delay=2)
def update_user_email(user_id, new_email):
    user = User.get(User.id == user_id)
    user.email = new_email
    user.save()

Mermaid Class Diagram

classDiagram
    class BaseModel {
        +BigIntegerField create_time
        +DateTimeField create_date
        +BigIntegerField update_time
        +DateTimeField update_date
        +to_dict()
        +to_human_model_dict(only_primary_with: list)
        +get_primary_keys_name()
        +query(**kwargs)
        +insert(__data=None, **insert)
        +_normalize_data(data, kwargs)
    }
    class DataBaseModel {
        <<BaseModel>>
    }
    class User {
        +CharField id
        +CharField access_token
        +CharField nickname
        +CharField password
        +CharField email
        +CharField language
        +BooleanField is_superuser
        +get_id()
    }
    class Tenant {
        +CharField id
        +CharField name
        +CharField public_key
        +IntegerField credit
    }
    class Document {
        +CharField id
        +CharField kb_id
        +CharField parser_id
        +JSONField parser_config
        +CharField source_type
        +CharField name
        +IntegerField size
        +FloatField progress
    }
    class Task {
        +CharField id
        +CharField doc_id
        +IntegerField from_page
        +IntegerField to_page
        +CharField task_type
        +IntegerField priority
        +FloatField progress
    }
    BaseModel <|-- DataBaseModel
    DataBaseModel <|-- User
    DataBaseModel <|-- Tenant
    DataBaseModel <|-- Document
    DataBaseModel <|-- Task

Summary

db_models.py is a comprehensive database model and management module that leverages Peewee ORM to define the data schema and provide robust database connection handling, serialization, locking, and migration utilities for the InfiniFlow platform. It abstracts database specifics and ensures consistency, reliability, and scalability of data operations across different database backends (MySQL and PostgreSQL).