knowledgebase_service.py
Overview
knowledgebase_service.py defines the KnowledgebaseService class, a specialized service layer for managing knowledge base entities within the InfiniFlow platform. Extending a generic CommonService, this class encapsulates business logic related to knowledge bases, including access control, document parsing status, tenant-based organization, parser configuration management, and CRUD operations.
This service interacts heavily with the database models (Knowledgebase, Document, Tenant, User, UserTenant) via the Peewee ORM and supports multi-tenant environments with permission enforcement.
Key functionalities provided include:
Verifying user permissions for accessing or deleting knowledge bases.
Checking document parsing status to ensure readiness for downstream processes.
Managing knowledge base metadata and parser configurations.
Enumerating knowledge bases by tenants and supporting pagination and filtering.
Handling document associations and counts atomically.
Classes and Methods
Class: KnowledgebaseService
Extends CommonService. The primary service class to manage knowledge base operations.
Attributes:
model: The PeeweeKnowledgebasemodel class used for database queries.
Methods:
accessible4deletion(kb_id: str, user_id: str) -> bool
Checks if the specified user has permission to delete a knowledge base. Only the creator of the knowledge base has deletion rights.
Parameters:
kb_id: Knowledge base unique identifier.user_id: User unique identifier attempting deletion.
Returns:
Trueif user is creator; otherwiseFalse.Usage:
if KnowledgebaseService.accessible4deletion("kb123", "user456"):
# proceed with deletion
Note: Returns
Falseif knowledge base does not exist or user is not creator.
is_parsed_done(kb_id: str) -> (bool, str|None)
Determines whether all documents in the knowledge base have finished parsing successfully.
Parameters:
kb_id: Knowledge base identifier.
Returns:
Tuple
(True, None)if all documents parsed.Tuple
(False, error_message)if parsing is incomplete or failed.
Details:
Checks parsing
runstatus on each document.Disallows chat initiation if parsing is running, canceled, failed, or unstarted with no chunks.
Example:
done, msg = KnowledgebaseService.is_parsed_done("kb123")
if not done:
print(msg)
list_documents_by_ids(kb_ids: List[str]) -> List[str]
Fetches document IDs associated with the specified knowledge base IDs.
Parameters:
kb_ids: List of knowledge base IDs.
Returns: List of document IDs linked to those knowledge bases.
get_by_tenant_ids(joined_tenant_ids: List[str], user_id: str, page_number: int, items_per_page: int, orderby: str, desc: bool, keywords: str, parser_id: Optional[str] = None) -> (List[dict], int)
Retrieves paginated knowledge bases owned by or shared with the user via tenants.
Parameters:
joined_tenant_ids: List of tenant IDs the user has joined.user_id: Current user ID.page_number: Pagination page index.items_per_page: Number of items per page.orderby: Field to sort by.desc: Whether to sort descending.keywords: Search keywords to filter names.parser_id: Optional filter for parser type.
Returns: Tuple of list of knowledge base dicts and total count.
Details:
Filters by tenant membership and permission (
TEAMor owner).Supports search and parser filtering.
Joins with
Userto retrieve tenant info.
get_kb_ids(tenant_id: str) -> List[str]
Returns all knowledge base IDs belonging to a given tenant.
Parameters:
tenant_id: Tenant unique identifier.
Returns: List of knowledge base IDs.
get_detail(kb_id: str) -> Optional[dict]
Fetches detailed information about a knowledge base, including metadata and configuration.
Parameters:
kb_id: Knowledge base ID.
Returns: Dictionary of knowledge base fields or
Noneif not found.Details: Joins with
Tenantto ensure valid tenant status.
update_parser_config(id: str, config: dict) -> None
Merges a new parser configuration into the existing one for a knowledge base.
Parameters:
id: Knowledge base ID.config: Partial or full configuration dictionary to merge.
Raises:
LookupErrorif knowledge base not found.Implementation Detail: Uses DFS to deeply merge nested dicts and lists without overwriting entire structures.
delete_field_map(id: str) -> None
Removes the "field_map" key from the knowledge base's parser configuration.
Parameters:
id: Knowledge base ID.
Raises:
LookupErrorif knowledge base not found.
get_field_map(ids: List[str]) -> dict
Aggregates and returns field mappings across multiple knowledge bases.
Parameters:
ids: List of knowledge base IDs.
Returns: Combined dictionary of all
"field_map"entries found.
get_by_name(kb_name: str, tenant_id: str) -> (bool, Optional[Knowledgebase])
Retrieves a knowledge base by name within a tenant's scope.
Parameters:
kb_name: Name of the knowledge base.tenant_id: Tenant ID.
Returns: Tuple
(exists, knowledgebase_instance).
get_all_ids() -> List[str]
Returns all knowledge base IDs in the system.
get_list(joined_tenant_ids: List[str], user_id: str, page_number: int, items_per_page: int, orderby: str, desc: bool, id: Optional[str], name: Optional[str]) -> List[dict]
Fetches knowledge bases filtered by multiple criteria with pagination.
Parameters:
joined_tenant_ids: Tenant IDs user belongs to.user_id: Current user ID.page_number: Pagination page.items_per_page: Items per page.orderby: Order field.desc: Descending order flag.id: Optional filter by knowledge base ID.name: Optional filter by knowledge base name.
Returns: List of knowledge base dictionaries.
accessible(kb_id: str, user_id: str) -> bool
Checks if a knowledge base is accessible by a user, based on tenant membership.
Parameters:
kb_id: Knowledge base ID.user_id: User ID.
Returns:
Trueif accessible, elseFalse.
get_kb_by_id(kb_id: str, user_id: str) -> List[dict]
Returns knowledge base info by ID if accessible by user.
Parameters:
kb_id: Knowledge base ID.user_id: User ID.
Returns: List with one knowledge base dict or empty list.
get_kb_by_name(kb_name: str, user_id: str) -> List[dict]
Returns knowledge base info by name if accessible by user.
Parameters:
kb_name: Knowledge base name.user_id: User ID.
Returns: List with one knowledge base dict or empty list.
atomic_increase_doc_num_by_id(kb_id: str) -> int
Atomically increments the document count of a knowledge base by 1.
Parameters:
kb_id: Knowledge base ID.
Returns: Number of rows updated (should be 1 on success).
Details: Also updates
update_timeandupdate_datefields.
update_document_number_in_init(kb_id: str, doc_num: int) -> None
Sets the document number for a knowledge base during system initialization.
Parameters:
kb_id: Knowledge base ID.doc_num: New document count to set.
Note: Only use during system init. Handles Peewee dirty fields carefully.
Implementation Details and Algorithms
Database Context Management: All methods accessing the database are decorated with
@DB.connection_context()ensuring proper connection lifecycle management.Permission Checks: Access control is enforced by joining knowledge base records with tenant and user membership tables.
Deep Configuration Merge: The
update_parser_configmethod performs a recursive deep merge of nested dictionaries and list union operations to avoid overwriting existing config entries inadvertently.Pagination and Sorting: Methods fetching lists support pagination, customizable ordering, and filtering, leveraging Peewee's ORM capabilities.
Atomic Updates: The method
atomic_increase_doc_num_by_idupdates counters atomically with timestamp updates for concurrency safety.Status Checks: Parsing status checks rely on the
TaskStatusenum and related document service to reflect real-time processing states.
Interactions with Other System Components
Database Models: Relies on
Knowledgebase,Document,Tenant,User, andUserTenantmodels for data retrieval and manipulation.CommonService Base: Inherits general CRUD operations and utilities from
CommonService.DocumentService: Calls
DocumentService.get_by_kb_idto gather documents associated with knowledge bases.Enums and Utilities: Uses
StatusEnum,TenantPermission,TaskStatusfor status codes and permissions; utility functions likecurrent_timestampanddatetime_formatfor date handling.Multi-tenant Access Control: Coordinates with tenant and user membership models to enforce scoped access.
Visual Diagram
classDiagram
class KnowledgebaseService {
+model: Knowledgebase
+accessible4deletion(kb_id: str, user_id: str) bool
+is_parsed_done(kb_id: str) (bool, str|None)
+list_documents_by_ids(kb_ids: List[str]) List[str]
+get_by_tenant_ids(joined_tenant_ids: List[str], user_id: str, page_number: int, items_per_page: int, orderby: str, desc: bool, keywords: str, parser_id: Optional[str]) (List[dict], int)
+get_kb_ids(tenant_id: str) List[str]
+get_detail(kb_id: str) dict
+update_parser_config(id: str, config: dict) None
+delete_field_map(id: str) None
+get_field_map(ids: List[str]) dict
+get_by_name(kb_name: str, tenant_id: str) (bool, Knowledgebase)
+get_all_ids() List[str]
+get_list(joined_tenant_ids: List[str], user_id: str, page_number: int, items_per_page: int, orderby: str, desc: bool, id: Optional[str], name: Optional[str]) List[dict]
+accessible(kb_id: str, user_id: str) bool
+get_kb_by_id(kb_id: str, user_id: str) List[dict]
+get_kb_by_name(kb_name: str, user_id: str) List[dict]
+atomic_increase_doc_num_by_id(kb_id: str) int
+update_document_number_in_init(kb_id: str, doc_num: int) None
}
KnowledgebaseService --> Knowledgebase : uses model
KnowledgebaseService ..> CommonService : inherits
KnowledgebaseService ..> DocumentService : calls get_by_kb_id
KnowledgebaseService --> UserTenant : joins for permission check
KnowledgebaseService --> Tenant : joins for tenant validation
KnowledgebaseService --> User : joins for user info in queries
Summary
knowledgebase_service.py provides the core business logic for knowledge base management in InfiniFlow, ensuring secure, tenant-aware access and maintaining the integrity of knowledge base states and configurations. It acts as the bridge between the database layer and higher-level application components that manipulate or display knowledge base data. The service's comprehensive method set covers validation, querying, updating, and status checking, supporting robust and scalable multi-tenant knowledge base operations.