dataset-util.ts

Overview

The dataset-util.ts file provides utility functions related to document parsers and data grouping within the context of knowledge processing or document management systems. Specifically, it:

Defines helper functions to identify specific types of document parsers.
Provides a generic utility to group a list of objects by specified fields, returning a summarized count per group.
Defines a FilterType type to represent grouped data with an identifier, label, and count.

This utility is likely used in higher-level components or services that manage document ingestion, classification, or filtering based on parser types or grouped metadata.

Detailed Explanation

Imports

import { DocumentParserType } from '@/constants/knowledge';

Imports the DocumentParserType enum or constant set from a centralized constants module, which defines different parser types such as KnowledgeGraph and Naive.

Functions

`isKnowledgeGraphParser`

export function isKnowledgeGraphParser(parserId: DocumentParserType): boolean

Purpose: Checks if the given parserId corresponds to the KnowledgeGraph parser.
Parameters:
- parserId (DocumentParserType): The parser identifier to check.
Returns: true if parserId is KnowledgeGraph, otherwise false.
Usage Example:

if (isKnowledgeGraphParser(currentParser)) {
  // Execute logic specific to KnowledgeGraph parser
}

`isNaiveParser`

export function isNaiveParser(parserId: DocumentParserType): boolean

Purpose: Checks if the given parserId corresponds to the Naive parser.
Parameters:
- parserId (DocumentParserType): The parser identifier to check.
Returns: true if parserId is Naive, otherwise false.
Usage Example:

if (isNaiveParser(currentParser)) {
  // Execute logic specific to Naive parser
}

Type Definitions

`FilterType`

export type FilterType = {
  id: string;
  label: string;
  count: number;
};

Represents an aggregated filter group.
Properties:
- id: Unique identifier for the group.
- label: Human-readable label/name for the group.
- count: Number of items aggregated under this group.

Generic Function: `groupListByType`

export function groupListByType<T extends Record<string, any>>(
  list: T[],
  idField: string,
  labelField: string,
): FilterType[]

Purpose: Groups a list of objects by the values in the specified idField and labelField properties, returning an array of FilterType objects that summarize the groups and count the number of items in each.
Parameters:
- list (T[]): An array of objects to be grouped. The generic type T extends a record with string keys.
- idField (string): The key in the object whose value will be used as the group's unique identifier.
- labelField (string): The key in the object whose value will serve as the group's human-readable label.
Returns: An array of FilterType objects representing groups and their counts.
Implementation Details:
- Iterates through each item in the input list.
- Checks if a group with the current item's idField value exists.
- If it exists, increments the group's count.
- Otherwise, creates a new group with count initialized to 1.
Usage Example:

interface FileItem {
  typeId: string;
  typeName: string;
  // other properties
}

const files: FileItem[] = [
  { typeId: 'pdf', typeName: 'PDF Document' },
  { typeId: 'pdf', typeName: 'PDF Document' },
  { typeId: 'doc', typeName: 'Word Document' },
];

const grouped = groupListByType(files, 'typeId', 'typeName');
// Result:
// [
//   { id: 'pdf', label: 'PDF Document', count: 2 },
//   { id: 'doc', label: 'Word Document', count: 1 }
// ]

Important Implementation Details

The grouping function uses a simple linear search (Array.find) within the accumulating array to check for existing groups. While suitable for small to moderately sized lists, this can be optimized using a Map or object for larger datasets to reduce lookup from O(n) to O(1).
The two parser identification functions rely on strict equality checks against imported constants, ensuring type safety and centralized management of parser types.

Interaction with Other System Parts

Constants Reference: The file depends on DocumentParserType from @/constants/knowledge, a module that centralizes parser type definitions. This promotes consistency across the application when referring to parser types.
Utility Usage: These utilities would be used by modules handling document ingestion, filtering, and UI components that display grouped data or parser-specific logic.
FilterType: The FilterType output from groupListByType is likely consumed by UI filter components or analytics modules to present grouped summaries.

Mermaid Flowchart Diagram

flowchart TD
    A[Input: list of objects] --> B[groupListByType]
    B --> C[Check if group with idField exists]
    C -- Yes --> D[Increment count]
    C -- No --> E[Add new FilterType with count=1]
    D & E --> F[Return array of FilterType]

    subgraph Parser Identification
        G[isKnowledgeGraphParser(parserId)] --> H[Returns boolean]
        I[isNaiveParser(parserId)] --> J[Returns boolean]
    end

Summary

The dataset-util.ts file provides focused utility functions for:

Identifying specific document parser types.
Grouping generic lists by specified fields with counts, returning a standardized filter structure.

Its simplicity and generic design make it a reusable utility in document processing pipelines and user interface components dealing with filters and parsers.