document.ts
Overview
document.ts defines TypeScript interfaces and types that model the structure and metadata of documents within the system, particularly focusing on document processing status, parsing configurations, and filtering criteria. This file does not contain executable code but provides a strongly-typed contract for how document-related data should be represented and handled throughout the application.
Key purposes include:
Defining the shape of a document's metadata and processing information (
IDocumentInfo).Specifying how parser configurations for documents are structured (
IParserConfigand related types).Defining filters to categorize or query documents based on their run status or file suffixes (
IDocumentInfoFilter).
These interfaces enable type safety, autocompletion, and easier maintenance when working with document-related data in the system.
Interfaces and Types
IDocumentInfo
Represents detailed metadata and operational data for a document in the system. This includes identification, processing state, parser settings, and various descriptive fields.
Property | Type | Description |
|---|---|---|
|
| Number of chunks the document is divided into. |
|
| Date when the document was created (likely in |
|
| Timestamp or time value representing creation time (exact format depends on implementation). |
|
| Identifier or name of the user/system that created the document. |
|
| Unique identifier for the document. |
|
| Identifier for the knowledge base to which this document belongs. |
|
| File path or storage location of the document. |
|
| Name or title of the document. |
|
| Configuration object detailing how the document should be parsed. |
|
| Identifier of the parser used on the document. |
|
| Timestamp when document processing began. |
|
| Duration of processing (likely in milliseconds or seconds). |
|
| Numeric representation of processing progress (e.g., percentage). |
|
| Human-readable message about current processing progress or status. |
|
| Enum value indicating the current running status of document processing (imported). |
|
| Size of the document (likely in bytes). |
|
| Type/category of the source document (e.g., "pdf", "docx"). |
|
| Current status of the document (e.g., "completed", "failed"). |
|
| URL or base64 string for a thumbnail preview image of the document. |
|
| Number of tokens extracted or associated with the document (potentially for NLP use). |
|
| Type of document (might overlap with |
|
| Date when the document metadata was last updated (likely |
|
| Timestamp or time value for last update time. |
|
| Additional metadata fields as key-value pairs, flexible for extensibility. |
Usage Example
const docInfo: IDocumentInfo = {
chunk_num: 10,
create_date: "2024-06-01",
create_time: 1654095600,
created_by: "user123",
id: "doc_001",
kb_id: "kb_abc",
location: "/files/doc_001.pdf",
name: "Project Plan",
parser_config: {
pages: [1, 2, 3],
delimiter: "\n",
html4excel: false,
layout_recognize: true,
raptor: { use_raptor: true },
graphrag: { use_graphrag: false }
},
parser_id: "parser_v1",
process_duration: 1200,
progress: 100,
progress_msg: "Completed",
run: RunningStatus.Completed,
size: 204800,
source_type: "pdf",
status: "completed",
thumbnail: "https://example.com/thumbs/doc_001.png",
token_num: 4500,
type: "report",
update_date: "2024-06-10",
update_time: 1654879200,
meta_fields: { department: "finance" }
};
IParserConfig
Defines configuration options for how a document should be parsed. This interface is used within IDocumentInfo to customize processing behavior.
Property | Type | Description |
|---|---|---|
|
| String used to split or parse the document's content (e.g., newline, comma). |
|
| Flag indicating if the document is an HTML file formatted for Excel export. |
|
| Enables layout recognition to better understand document structure during parsing. |
|
| Array representing page numbers or page objects to parse from the document. |
|
| Configuration specific to the "Raptor" parsing module (see below). |
|
| Configuration specific to the "GraphRag" parsing module (see below). |
Raptor
Configuration options for the Raptor parsing engine/module.
Property | Type | Description |
|---|---|---|
|
| Indicates whether to use the Raptor parser. |
GraphRag
Configuration options for the GraphRag parsing engine/module, which appears to add graph-based semantic parsing features.
Property | Type | Description |
|---|---|---|
|
| Enables community detection features during parsing. |
|
| Specifies which entity types to recognize in the document. |
|
| Parsing method or algorithm name to use within GraphRag. |
|
| Enables resolution features (e.g., entity resolution). |
|
| Flag to enable use of the GraphRag parser. |
IDocumentInfoFilter
Defines filters used to query or categorize documents based on their processing run status and file suffix.
Property | Type | Description |
|---|---|---|
|
| A map from run status codes to their counts or weights. |
|
| A map from file suffixes/extensions (e.g., "pdf", "docx") to counts or weights. |
Implementation Details and Algorithms
The file primarily defines data shapes rather than implementing algorithms.
The
parser_configinterface supports plug-in configurations for different parsing modules (RaptorandGraphRag), indicating extensible parsing strategies.Usage of optional properties (e.g.,
process_begin_at,meta_fields) allows flexible data representation depending on document state or available metadata.The
runproperty uses an imported enumRunningStatus, which likely provides a standardized set of statuses like Running, Completed, Failed, etc., to track document processing lifecycle.
Interaction with Other Parts of the System
The
RunningStatusenum imported from@/constants/knowledgeshows this file depends on a centralized constants module, ensuring consistent usage of document run states.Interfaces here are likely consumed by:
Document management services that handle storing, updating, and querying document metadata.
Processing pipelines that parse documents according to the
IParserConfigand update progress/status fields inIDocumentInfo.UI components that display document lists, statuses, and metadata, using these interfaces for type safety.
Filtering modules or search functionality that utilize
IDocumentInfoFilterto enable users to filter documents by status or file type.
Visual Diagram
classDiagram
class IDocumentInfo {
+chunk_num: number
+create_date: string
+create_time: number
+created_by: string
+id: string
+kb_id: string
+location: string
+name: string
+parser_config: IParserConfig
+parser_id: string
+process_begin_at?: string
+process_duration: number
+progress: number
+progress_msg: string
+run: RunningStatus
+size: number
+source_type: string
+status: string
+thumbnail: string
+token_num: number
+type: string
+update_date: string
+update_time: number
+meta_fields?: Record<string, any>
}
class IParserConfig {
+delimiter?: string
+html4excel?: boolean
+layout_recognize?: boolean
+pages: any[]
+raptor?: Raptor
+graphrag?: GraphRag
}
class Raptor {
+use_raptor: boolean
}
class GraphRag {
+community?: boolean
+entity_types?: string[]
+method?: string
+resolution?: boolean
+use_graphrag?: boolean
}
class IDocumentInfoFilter {
+run_status: Record<number, number>
+suffix: Record<string, number>
}
IDocumentInfo --> IParserConfig : has
IParserConfig --> Raptor : optional
IParserConfig --> GraphRag : optional
Summary
The document.ts file is a foundational part of the system's type definitions for handling documents. It carefully models document metadata, parsing configurations, and filtering criteria, enabling consistent use of document data throughout the application layers, from back-end processing to front-end display.
By defining clear interfaces, this file supports extensible parsing strategies and flexible document processing tracking, ensuring the system can grow and adapt to different document types and parsing requirements.