llm.py
Overview
The llm.py file is a core component module of the InfiniFlow system designed to provide interaction with large language models (LLMs). It defines the LLM component and its parameter class LLMParam, enabling the system to generate text completions or structured outputs by querying configured LLM backends. This module encapsulates prompt preparation, streaming and non-streaming generation, error handling, and integration with tenant-specific LLM configurations and multi-modal inputs (e.g., images). It is a key part of the agent framework facilitating advanced language understanding, reasoning, and response generation functionalities.
Classes and Their Detailed Descriptions
1. LLMParam (inherits from ComponentParamBase)
Purpose
Encapsulates all configuration parameters for the LLM component, including LLM model identifiers, prompt templates, generation parameters, and output formatting preferences.
Attributes
llm_id(str): Identifier for the LLM model to be used.sys_prompt(str): System-level prompt template string.prompts(list[dict]): List of prompt dicts each with keys"role"and"content", providing the conversation context.max_tokens(int): Maximum token limit for generation.temperature(float): Sampling temperature controlling randomness.top_p(float): Nucleus sampling parameter.presence_penalty(float): Penalty for new topic introduction.frequency_penalty(float): Penalty for repeated tokens.output_structure(dictorNone): Optional JSON schema that output must follow.cite(bool): Whether to include citation prompts referencing document chunks.visual_files_var(strorNone): Variable name for input images to support multi-modal LLMs.
Methods
check() -> None
Validates the parameter values (e.g., numeric ranges, non-empty required fields). Raises exceptions if invalid.gen_conf() -> dict
Generates a dictionary of generation parameters (max_tokens,temperature, etc.) based on enabled flags and non-zero values.
Usage Example
param = LLMParam()
param.llm_id = "gpt-4"
param.sys_prompt = "You are a helpful assistant."
param.prompts = [{"role": "user", "content": "Hello, world!"}]
param.max_tokens = 150
param.temperature = 0.7
param.check() # validates parameters
conf = param.gen_conf() # get generation config dictionary
2. LLM (inherits from ComponentBase)
Purpose
Main component class representing an LLM interaction node within the agent pipeline. Manages prompt preparation, generation calls (both streaming and batch), parsing, and output formatting. Supports multi-modal inputs and structured output enforcement.
Attributes
component_name(str): Fixed name"LLM"representing the component.chat_mdl(LLMBundle): Backend LLM communication client initialized with tenant and model info.imgs(list[str]): Base64-encoded image strings if visual inputs are provided.
Constructor
def __init__(self, canvas, id, param: ComponentParamBase)
Initializes the component with references to the canvas (execution environment), component ID, and parameters. Sets up the LLM client bundle and initializes image list.
Public Methods
get_input_form() -> dict[str, dict]
Returns the form schema for user inputs extracted from the system prompt and user prompts.get_input_elements() -> dict[str, Any]
Parses the system prompt and user prompts to identify input variables.set_debug_inputs(inputs: dict[str, dict]) -> None
Allows setting debug inputs overriding normal input extraction.add2system_prompt(txt: str) -> None
Appends additional text to the system prompt.add_memory(user: str, assist: str, func_name: str, params: dict, results: str, user_defined_prompt: dict = {}) -> None
Adds a summarized memory entry of the interaction for later retrieval or context.thoughts() -> str
Returns a formatted string representing the LLM's current "thinking" state based on the last user message.
Core Internal Methods and Workflows
_prepare_prompt_variables() -> tuple[str, list[dict], dict]
Gathers and formats variables for prompt templates.
Filters and processes visual inputs (images) if specified.
Combines system prompt and user prompts, applies string formatting substitutions.
Extracts special tagged prompt sections (e.g.,
<TASK_ANALYSIS>) and removes them from the main prompt.Adds citation prompts if enabled and relevant references exist.
Returns a tuple of
(final_system_prompt, messages_list, user_defined_prompt_dict).
_extract_prompts(sys_prompt: str) -> tuple[dict, str]
Uses regex to find and extract special prompt tags such as
<TASK_ANALYSIS>,<PLAN_GENERATION>, etc.Returns a dict of extracted prompt contents keyed by lowercase tag name and the cleaned system prompt.
_generate(msg: list[dict], **kwargs) -> str
Performs synchronous generation by invoking the LLM backend.
Supports optional image input for multi-modal generation.
Returns the generated text response.
_generate_streamly(msg: list[dict], **kwargs) -> Generator[str, None, None]
Performs streaming generation, yielding incremental text chunks.
Implements filtering to handle special
<think>tags embedded in LLM outputs.Supports multi-modal image input as well.
_invoke(**kwargs) -> None
Main execution method decorated with a timeout.
Prepares prompts and message history.
Supports output structured as JSON and retries generation on errors.
If output is structured, attempts to parse and set structured output.
If downstream components expect streaming output, sets output as a partial streaming generator function.
Handles error logging and fallback outputs.
_stream_output(prompt: str, msg: list[dict]) -> Generator[str, None, None]
Helper for streaming output generation.
Yields incremental output from
_generate_streamly.Handles error cases gracefully.
Important Implementation Details & Algorithms
Prompt Variable Extraction and Formatting:
Inputs are extracted from system and user prompt templates by scanning for variable placeholders. These are dynamically replaced with runtime values, supporting flexible prompt customization.Special Prompt Tag Extraction:
Uses regex to parse and remove special embedded prompt sections such as<TASK_ANALYSIS>and<CITATION_GUIDELINES>. These can be used for more granular control over multi-step reasoning or citation policy.Streaming Generation with Think Tags:
The streaming generator parses incremental LLM output to detect and handle<think>and</think>tags, allowing the system to model internal thought or reasoning steps distinctly from user-visible answers.Retry Logic with JSON Output Parsing:
When structured JSON output is requested, the component retries generation up to a configured maximum if the LLM returns unparsable JSON or error tokens ("ERROR"). Uses thejson_repairpackage to attempt to fix malformed JSON responses.Multi-modal Image Support:
If a visual files variable is provided, the component filters inputs for valid base64-encoded images and switches the LLM client to an image-to-text capable model type for enriched multimodal interactions.Integration with Tenant and LLM Services:
The component queries tenant-specific LLM configurations usingTenantLLMServiceand manages LLM types withLLMTypeenums, ensuring consistent backend selection per user context.
Interaction with Other System Parts
Canvas (
self._canvas):
Acts as the execution context/environment managing variables, memory, component graph, and message history.LLM Backend (
LLMBundle):
Client interface wrapping calls to the underlying LLM service providers, supporting chat and streaming APIs.TenantLLMService:
Provides tenant-aware mapping from LLM IDs to LLM types and configurations.RAG Prompt Utilities:
Functions likemessage_fit_inandcitation_promptare used for prompt length management and citation insertion.Agent Component Framework:
InheritsComponentBaseandComponentParamBaseas part of the modular agent architecture.Memory System:
Summarizes tool calls and adds them to long-term memory via the canvas.
Example Usage Scenario
Suppose a developer wants to instantiate an LLM component to answer user queries with GPT-4 using a custom system prompt and structured JSON output.
param = LLMParam()
param.llm_id = "gpt-4"
param.sys_prompt = "You are a helpful assistant. <TASK_ANALYSIS>Analyze the input carefully.</TASK_ANALYSIS>"
param.prompts = [{"role": "user", "content": "{user_query}"}]
param.max_tokens = 200
param.temperature = 0.5
param.output_structure = {"answer": "", "sources": []}
param.check()
llm_component = LLM(canvas, "llm1", param)
llm_component.set_debug_inputs({"user_query": {"value": "Explain quantum computing."}})
llm_component._invoke()
result = llm_component.get_output("structured_content")
print(result)
Visual Diagram
classDiagram
class LLMParam {
+llm_id: str
+sys_prompt: str
+prompts: list
+max_tokens: int
+temperature: float
+top_p: float
+presence_penalty: float
+frequency_penalty: float
+output_structure: dict
+cite: bool
+visual_files_var: str
+check()
+gen_conf() dict
}
class LLM {
+component_name: str
-chat_mdl: LLMBundle
-imgs: list
+__init__(canvas, id, param)
+get_input_form() dict
+get_input_elements() dict
+set_debug_inputs(inputs)
+add2system_prompt(txt)
-_prepare_prompt_variables() tuple
-_extract_prompts(sys_prompt) tuple
-_generate(msg, kwargs) str
-_generate_streamly(msg, kwargs) Generator
-_invoke(kwargs)
-_stream_output(prompt, msg) Generator
+add_memory(user, assist, func_name, params, results, user_defined_prompt)
+thoughts() str
}
LLMParam <|-- LLM
Summary
The llm.py file is a sophisticated module implementing an LLM integration component within an AI agent framework. It handles prompt management, generation calls to tenant-configured LLM backends, supports structured and streaming outputs, and manages multi-modal inputs. Its design facilitates extensible and robust usage of LLMs in multi-turn conversational or reasoning tasks, with built-in error handling and flexible output formatting. This component interacts closely with the canvas execution environment, tenant services, and prompt utilities to deliver tailored LLM functionalities in InfiniFlow’s pipeline.