knowledge.ts


Overview

The knowledge.ts file defines TypeScript interfaces representing various entities related to a knowledge base system. These interfaces specify the shapes of data objects used throughout the knowledge management domain, including knowledge bases, knowledge files, tenants, data chunks, testing results, parser configurations, and knowledge graph structures.

This file acts as a foundational schema layer, providing strong typing for consistent data handling, validation, and interaction between different parts of the knowledge system. It does not contain executable logic, but the interfaces are crucial for ensuring data integrity and facilitating communication between APIs, UI components, parsers, and backend services.


Interfaces and Types

1. IKnowledge

Represents a knowledge base entity in the system.

Property

Type

Description

avatar?

any

Optional avatar or image associated with the KB.

chunk_num

number

Number of chunks in the KB.

create_date

string

Date when the KB was created (formatted string).

create_time

number

Creation time as a timestamp or numeric value.

created_by

string

Identifier of the creator.

description

string

Description or summary of the KB.

doc_num

number

Number of documents included.

id

string

Unique identifier of the KB.

name

string

Name of the knowledge base.

parser_config

ParserConfig

Configuration settings for the parser.

parser_id

string

Identifier for the parser used.

permission

string

Permission level or access control string.

similarity_threshold

number

Threshold for similarity-based operations.

status

string

Current status of the KB (e.g., active, inactive).

tenant_id

string

Tenant or organization identifier.

token_num

number

Number of tokens contained in the KB.

update_date

string

Last update date.

update_time

number

Last update time as a timestamp or numeric value.

vector_similarity_weight

number

Weight factor used for vector similarity calculations.

embd_id

string

Embedding model or vector ID used.

nickname

string

Optional nickname or alias for the KB.

operator_permission

number

Numeric permission level for operators.

size

number

Size of the KB (units not specified, likely bytes or token count).

Usage example:

const kb: IKnowledge = {
  chunk_num: 10,
  create_date: "2024-06-01",
  create_time: 1685600000,
  created_by: "user123",
  description: "Technical documentation KB",
  doc_num: 5,
  id: "kb001",
  name: "TechDocs",
  parser_config: { chunk_token_num: 512, layout_recognize: true },
  parser_id: "parserA",
  permission: "read-write",
  similarity_threshold: 0.8,
  status: "active",
  tenant_id: "tenantX",
  token_num: 10000,
  update_date: "2024-06-10",
  update_time: 1686300000,
  vector_similarity_weight: 0.5,
  embd_id: "embedModel1",
  nickname: "TechDocsKB",
  operator_permission: 2,
  size: 2048,
};

2. IKnowledgeResult

A container for paginated or bulk knowledge base results.

Property

Type

Description

kbs

IKnowledge[]

Array of knowledge base entities.

total

number

Total number of knowledge bases available.


3. Raptor

Parser-related flag indicating usage of a "Raptor" feature.

Property

Type

Description

use_raptor

boolean

Indicates if Raptor is used.


4. ParserConfig

Configuration settings for parsing documents within a knowledge base.

Property

Type

Description

from_page?

number (optional)

Starting page number to parse.

to_page?

number (optional)

Ending page number to parse.

auto_keywords?

number (optional)

Number of automatic keywords to extract.

auto_questions?

number (optional)

Number of automatic questions to generate.

chunk_token_num?

number (optional)

Number of tokens per chunk.

delimiter?

string (optional)

Delimiter used for chunking.

html4excel?

boolean (optional)

Whether to parse HTML for Excel files.

layout_recognize?

boolean (optional)

Whether to use layout recognition.

raptor?

Raptor (optional)

Raptor feature configuration.

tag_kb_ids?

string[] (optional)

IDs of tagged KBs.

topn_tags?

number (optional)

Number of top tags to consider.

graphrag?

{ use_graphrag?: boolean } (optional)

Graphrag feature toggle.


5. IKnowledgeFileParserConfig

Subset of parser config specifically for knowledge file parsing.

Property

Type

Description

chunk_token_num

number

Tokens per chunk.

layout_recognize

boolean

Layout recognition enabled or not.

pages

number[][]

Array representing page ranges or groups.

task_page_size

number

Number of pages to process per task batch.


6. IKnowledgeFile

Represents a file within a knowledge base, including parsing status and metadata.

Property

Type

Description

chunk_num

number

Number of chunks in the file.

create_date

string

Creation date of the file.

create_time

number

Creation timestamp or numeric time.

created_by

string

Creator identifier.

id

string

Unique file ID.

kb_id

string

ID of the knowledge base this file belongs to.

location

string

Storage location or path.

name

string

File name.

parser_id

string

ID of the parser used.

process_begin_at?

any (optional)

Timestamp when processing started.

process_duration

number

Duration of the parse process in milliseconds.

progress

number

Parsing progress percentage (0-100).

progress_msg

string

Parsing log or status message.

run

RunningStatus

Parsing status enum (imported externally).

size

number

File size (probably in bytes).

source_type

string

Source type (e.g., PDF, DOCX).

status

string

Status of the file (e.g., enabled).

thumbnail?

any (optional)

Optional base64 image thumbnail.

token_num

number

Number of tokens in the file.

type

string

File type or category.

update_date

string

Last updated date.

update_time

number

Last updated timestamp.

parser_config

IKnowledgeFileParserConfig

Parser configuration for this file.


7. ITenantInfo

Represents tenant-specific configuration and identifiers, likely for integration with various AI or processing services.

Property

Type

Description

asr_id

string

ASR (Automatic Speech Recognition) service ID.

embd_id

string

Embedding service/model ID.

img2txt_id

string

Image to Text service ID.

llm_id

string

Large Language Model ID.

name

string

Tenant or organization name.

parser_ids

string

Comma-separated parser IDs associated.

role

string

Role designation (e.g., admin, user).

tenant_id

string

Unique tenant identifier.

chat_id

string

Chat service or bot ID.

speech2text_id

string

Speech to text service ID.

tts_id

string

Text to speech service ID.


8. IChunk

Represents a chunk of content within a document or knowledge file, including keywords and positioning data.

Property

Type

Description

available_int

number

Enable flag (0 = disabled, 1 = enabled).

chunk_id

string

Unique chunk identifier.

content_with_weight

string

Content text possibly annotated with weights.

doc_id

string

Document ID the chunk belongs to.

doc_name

string

Document name.

image_id

string

Associated image ID if any.

important_kwd?

string[] (optional)

Array of important keywords related to chunk.

question_kwd?

string[] (optional)

Question keywords extracted from content.

tag_kwd?

string[] (optional)

Tag keywords associated with the chunk.

positions

number[][]

Coordinates or positions within the document.

tag_feas?

Record<string, number> (optional)

Tag features with numeric values.


9. ITestingChunk

Extended chunk interface used for testing or evaluation purposes, including similarity metrics and vector embeddings.

Property

Type

Description

chunk_id

string

Chunk identifier.

content_ltks

string

Content for long-term knowledge search?

content_with_weight

string

Content with weighting information.

doc_id

string

Document ID.

doc_name

string

Document name.

img_id

string

Image ID (possibly deprecated as image_id is also present).

image_id

string

Image ID.

important_kwd

any[]

Array of important keywords.

kb_id

string

Knowledge base ID this chunk belongs to.

similarity

number

Similarity score.

term_similarity

number

Term-based similarity score.

vector

number[]

Embedding vector representation.

vector_similarity

number

Similarity based on vector comparison.

highlight

string

Highlighted text or terms.

positions

number[][]

Positions in the document.

docnm_kwd

string

Document name keywords.

doc_type_kwd

string

Document type keywords.


10. ITestingDocument

Represents metadata summary for a document in testing contexts.

Property

Type

Description

count

number

Number of occurrences or matches.

doc_id

string

Document identifier.

doc_name

string

Document name.


11. ITestingResult

Testing results container including chunks and documents.

Property

Type

Description

chunks

ITestingChunk[]

Array of testing chunks.

documents

ITestingDocument[]

Array of testing documents.

total

number

Total number of results.

labels?

Record<string, number> (optional)

Optional label counts or scores.


12. INextTestingResult

Similar to ITestingResult but with additional properties.

Property

Type

Description

chunks

ITestingChunk[]

Array of testing chunks.

doc_aggs

ITestingDocument[]

Aggregated documents.

total

number

Total count.

labels?

Record<string, number> (optional)

Optional labels.

isRuned?

boolean (optional)

Flag indicating if the test has run.


13. IRenameTag

Defines a rename operation for tags.

Property

Type

Description

fromTag

string

Existing tag name.

toTag

string

New tag name to rename to.


14. IKnowledgeGraph

Represents a knowledge graph structure and associated mind map visualization data.

Property

Type

Description

graph

Record<string, any>

Core graph data structure (nodes, edges).

mind_map

TreeData

Tree structure for mind map visualization (from @antv/g6 library).


Important Implementation Details


Interactions with Other System Components


Visual Diagram

classDiagram
    class IKnowledge {
        +avatar?: any
        +chunk_num: number
        +create_date: string
        +create_time: number
        +created_by: string
        +description: string
        +doc_num: number
        +id: string
        +name: string
        +parser_config: ParserConfig
        +parser_id: string
        +permission: string
        +similarity_threshold: number
        +status: string
        +tenant_id: string