dataset-util.ts
Overview
The dataset-util.ts file provides utility functions related to document parsers and data grouping within the context of knowledge processing or document management systems. Specifically, it:
Defines helper functions to identify specific types of document parsers.
Provides a generic utility to group a list of objects by specified fields, returning a summarized count per group.
Defines a
FilterTypetype to represent grouped data with an identifier, label, and count.
This utility is likely used in higher-level components or services that manage document ingestion, classification, or filtering based on parser types or grouped metadata.
Detailed Explanation
Imports
import { DocumentParserType } from '@/constants/knowledge';
Imports the
DocumentParserTypeenum or constant set from a centralized constants module, which defines different parser types such asKnowledgeGraphandNaive.
Functions
isKnowledgeGraphParser
export function isKnowledgeGraphParser(parserId: DocumentParserType): boolean
Purpose: Checks if the given
parserIdcorresponds to theKnowledgeGraphparser.Parameters:
parserId(DocumentParserType): The parser identifier to check.
Returns:
trueifparserIdisKnowledgeGraph, otherwisefalse.Usage Example:
if (isKnowledgeGraphParser(currentParser)) {
// Execute logic specific to KnowledgeGraph parser
}
isNaiveParser
export function isNaiveParser(parserId: DocumentParserType): boolean
Purpose: Checks if the given
parserIdcorresponds to theNaiveparser.Parameters:
parserId(DocumentParserType): The parser identifier to check.
Returns:
trueifparserIdisNaive, otherwisefalse.Usage Example:
if (isNaiveParser(currentParser)) {
// Execute logic specific to Naive parser
}
Type Definitions
FilterType
export type FilterType = {
id: string;
label: string;
count: number;
};
Represents an aggregated filter group.
Properties:
id: Unique identifier for the group.label: Human-readable label/name for the group.count: Number of items aggregated under this group.
Generic Function: groupListByType
export function groupListByType<T extends Record<string, any>>(
list: T[],
idField: string,
labelField: string,
): FilterType[]
Purpose: Groups a list of objects by the values in the specified
idFieldandlabelFieldproperties, returning an array ofFilterTypeobjects that summarize the groups and count the number of items in each.Parameters:
list(T[]): An array of objects to be grouped. The generic typeTextends a record with string keys.idField(string): The key in the object whose value will be used as the group's unique identifier.labelField(string): The key in the object whose value will serve as the group's human-readable label.
Returns: An array of
FilterTypeobjects representing groups and their counts.Implementation Details:
Iterates through each item in the input list.
Checks if a group with the current item's
idFieldvalue exists.If it exists, increments the group's
count.Otherwise, creates a new group with
countinitialized to 1.
Usage Example:
interface FileItem {
typeId: string;
typeName: string;
// other properties
}
const files: FileItem[] = [
{ typeId: 'pdf', typeName: 'PDF Document' },
{ typeId: 'pdf', typeName: 'PDF Document' },
{ typeId: 'doc', typeName: 'Word Document' },
];
const grouped = groupListByType(files, 'typeId', 'typeName');
// Result:
// [
// { id: 'pdf', label: 'PDF Document', count: 2 },
// { id: 'doc', label: 'Word Document', count: 1 }
// ]
Important Implementation Details
The grouping function uses a simple linear search (
Array.find) within the accumulating array to check for existing groups. While suitable for small to moderately sized lists, this can be optimized using a Map or object for larger datasets to reduce lookup from O(n) to O(1).The two parser identification functions rely on strict equality checks against imported constants, ensuring type safety and centralized management of parser types.
Interaction with Other System Parts
Constants Reference: The file depends on
DocumentParserTypefrom@/constants/knowledge, a module that centralizes parser type definitions. This promotes consistency across the application when referring to parser types.Utility Usage: These utilities would be used by modules handling document ingestion, filtering, and UI components that display grouped data or parser-specific logic.
FilterType: The
FilterTypeoutput fromgroupListByTypeis likely consumed by UI filter components or analytics modules to present grouped summaries.
Mermaid Flowchart Diagram
flowchart TD
A[Input: list of objects] --> B[groupListByType]
B --> C[Check if group with idField exists]
C -- Yes --> D[Increment count]
C -- No --> E[Add new FilterType with count=1]
D & E --> F[Return array of FilterType]
subgraph Parser Identification
G[isKnowledgeGraphParser(parserId)] --> H[Returns boolean]
I[isNaiveParser(parserId)] --> J[Returns boolean]
end
Summary
The dataset-util.ts file provides focused utility functions for:
Identifying specific document parser types.
Grouping generic lists by specified fields with counts, returning a standardized filter structure.
Its simplicity and generic design make it a reusable utility in document processing pipelines and user interface components dealing with filters and parsers.