categorize.py
Overview
The categorize.py file defines a component within the InfiniFlow system responsible for classifying user input queries into predefined categories using a large language model (LLM). This component leverages prompt engineering and example-based learning to assign a single, most appropriate category to a given input query, facilitating downstream workflows based on the classification result.
The file primarily contains two classes:
CategorizeParam: Handles parameter definitions, validation, and prompt construction for the categorization task.Categorize: Implements the LLM-based categorization logic, invoking the model and processing its output to determine the relevant category and next steps.
This component is designed to be flexible and extendable, allowing users to define categories, provide examples, and descriptions to guide the classification.
Classes and Methods
Class: CategorizeParam
Inherits from: LLMParam
Defines and manages the parameters required by the Categorize component.
Attributes
category_description(dict):
Maps category names (strings) to their metadata, containing:description(optionalstr): Explanation of the category.examples(list[str]): Example queries demonstrating the category.to(list): Identifiers of the next components or actions after categorization.
query(str):
The key identifying the query input, default"sys.query".message_history_window_size(int):
Number of recent messages from the chat history to consider for context, default is 1.sys_prompt(str):
The system prompt dynamically generated based on the current category definitions and examples.
Methods
init()
Initializes default values and updates the system prompt.check()
Validates the parameters:Ensures
message_history_window_sizeis a positive integer.Ensures
category_descriptionis not empty.Validates that no category names are empty.
Confirms each category has a non-empty
"to"field.
Raises
ValueErrorif any validation fails.get_input_form() -> dict[str, dict]
Returns the input form schema for user interfaces or API usage.
Returns:{ "query": { "type": "line", "name": "Query" } }update_prompt()
Dynamically constructs thesys_promptstring used to instruct the LLM.
It includes:A list of all category names.
Descriptions of categories if provided.
Example user queries mapped to categories (formatted to replace newlines with spaces).
The prompt instructs the model to classify a question exclusively into one category or "Other" if none fits.
Usage Example
param = CategorizeParam()
param.category_description = {
"Billing": {
"description": "Questions related to billing and payments.",
"examples": ["How do I update my payment method?", "Why was I charged twice?"],
"to": ["billing_handler"]
},
"Technical": {
"description": "Technical support and troubleshooting.",
"examples": ["My app crashes on startup.", "How to reset my password?"],
"to": ["tech_support_handler"]
}
}
param.message_history_window_size = 3
param.check()
param.update_prompt()
print(param.sys_prompt)
Class: Categorize
Inherits from: LLM, ABC (Abstract Base Class)
Defines the categorization component that uses an LLM chat model to classify inputs.
Class Attributes
component_name(str):
Identifier name of the component, set to"Categorize".
Methods
_invoke(self, **kwargs)
Decorator:@timeout(limits execution time, default 10 minutes or environment variableCOMPONENT_EXEC_TIMEOUT)
This is the main execution method called by the framework to perform categorization.Process:
Retrieves recent message history from the component's canvas based on
message_history_window_size.Updates the last user message with the input query from
kwargsor the canvas variable.Updates the prompt in
CategorizeParambased on current category definitions.Instantiates an LLM chat model using
LLMBundle.Constructs a formatted prompt string showing recent user/assistant conversation.
Sends the prompt to the chat model and receives a classification answer.
Logs the input and output for debugging.
Checks for error prefixes in the answer, raising exceptions if found.
Counts occurrences of each category name in the answer (case-insensitive).
Selects the category with the highest count or defaults to the last category's
"to"IDs.Sets the output variables:
"category_name": The chosen category name."_next": List of next component IDs to route processing.
Parameters:
Accepts keyword arguments, expecting the query under key"sys.query"or uses the canvas variable.Returns:
None (outputs are set on the component canvas).thoughts(self) -> str
Returns a string representation of the current categorization options formatted as a question, useful for logging or debugging.Example output:
Which should it falls into `Billing`, `Technical`? ...
Important Implementation Details
Prompt Engineering:
The component builds a detailed system prompt including category descriptions and examples to guide the LLM towards accurate classification.Category Selection Algorithm:
The output from the LLM is scanned for mentions of category names. The category with the highest frequency is selected, ensuring the most relevant category is chosen even if the model returns multiple names.Timeout Handling:
The_invokemethod is wrapped with a timeout decorator that aborts execution if it exceeds a configurable time limit (default 10 minutes).Integration with Canvas:
The component interacts with acanvasobject (not defined here) to retrieve message history, input variables, and set outputs—enabling smooth integration into larger workflows.Error Detection:
The system checks for error prefixes in the LLM response indicating failure cases and raises exceptions accordingly.
Interaction with Other System Components
Uses
LLMBundlefromapi.db.services.llm_serviceto interact with the underlying LLM infrastructure.Extends
LLMParamandLLMfromagent.component.llm, indicating it is part of a modular LLM-driven agent architecture.Utilizes utility functions like
timeoutfromapi.utils.api_utilsfor execution control.Reads environment variables for timeout configuration.
Outputs routing information (
_next) used by the system to determine subsequent component execution paths based on the classification.
Visual Diagram
classDiagram
class CategorizeParam {
+category_description: dict
+query: str
+message_history_window_size: int
+sys_prompt: str
+__init__()
+check()
+get_input_form() dict
+update_prompt()
}
class Categorize {
+component_name: str = "Categorize"
+_invoke(**kwargs)
+thoughts() str
}
CategorizeParam <|-- Categorize
class LLMParam
class LLM
CategorizeParam --|> LLMParam
Categorize --|> LLM
Categorize --|> ABC
Summary
The categorize.py file implements a flexible LLM-powered categorization component within InfiniFlow, allowing dynamic category definitions enriched with descriptions and examples. It tightly integrates with the system's LLM framework and workflow canvas to classify user queries and route subsequent processing steps.
The design emphasizes prompt engineering, model interaction, and robust output parsing to ensure accurate and actionable category assignments.