retrieval_categorize_and_generate.json
Overview
This JSON file defines a modular conversational AI pipeline consisting of components that sequentially process user queries related to product information. Its primary purpose is to:
Start a conversation with a greeting.
Categorize user questions based on content (product-related or not).
Retrieve relevant knowledge base data for product-related queries.
Generate detailed answers using an AI agent.
Provide fallback messages for uncategorized or unsupported queries.
The file outlines components, their parameters, and how data flows between them in a directed acyclic graph (DAG)-style pipeline, enabling flexible orchestration of complex AI workflows.
Components and Structure
The system is composed of six main components connected in a pipeline:
Component Name | Role | Upstream | Downstream |
|---|---|---|---|
Begin | Entry point; sends greeting message | None | Categorize:0 |
Categorize:0 | Classifies query as product-related or other | Begin | Retrieval:0 (for product-related), Message:0 (for others) |
Retrieval:0 | Retrieves relevant knowledge base content | Categorize:0 | Generate:0 |
Generate:0 | Generates detailed answer using LLM | Retrieval:0 | Message:1 |
Message:0 | Sends fallback message for non-product queries | Categorize:0 | None |
Message:1 | Sends generated answer message | Generate:0 | None |
Detailed Component Descriptions
1. Begin
Type: Component / Message Sender
Purpose: Initiates the conversation with a fixed greeting.
Parameters:
prologue(string): Greeting text. Example:"Hi there!"
Upstream: None
Downstream:
categorize:0Usage Example:
{ "component_name": "Begin", "params": { "prologue": "Hi there!" } }Description: This component acts as a conversation starter by sending a simple greeting message downstream to the categorization component.
2. Categorize:0
Type: Categorization Component
Purpose: Classifies user input into two categories:
product_relatedorothers.Parameters:
llm_id(string): Identifier of the language model used for classification (e.g.,"deepseek-chat").category_description(object): Defines categories with descriptions, examples, and routing destinations.product_related: Questions about product usage, appearance, or functionality.others: Questions unrelated to the product.
Upstream:
beginDownstream:
retrieval:0if product-relatedmessage:0if others
Routing Logic:
If input classified as
product_related, forward to retrieval for knowledge base search.Otherwise, forward to message component for fallback reply.
Usage Example:
{ "component_name": "Categorize", "params": { "llm_id": "deepseek-chat", "category_description": { "product_related": { "description": "The question is about the product usage, appearance and how it works.", "to": ["retrieval:0"] }, "others": { "description": "The question is not about the product usage, appearance and how it works.", "to": ["message:0"] } } } }Implementation Detail: Utilizes an LLM as a classifier to decide query category and route accordingly.
3. Message:0 (Fallback Message)
Type: Message Component
Purpose: Provides a default response when the question is outside product scope.
Parameters:
content(array of string): Predefined reply(s). Example:[ "Sorry, I don't know. I'm an AI bot." ]
Upstream:
categorize:0Downstream: None
Usage Example:
{ "component_name": "Message", "params": { "content": ["Sorry, I don't know. I'm an AI bot."] } }Description: This component returns a polite refusal or fallback message when the categorizer determines the query is unrelated to product information.
4. Retrieval:0
Type: Retrieval / Search Component
Purpose: Fetches the most relevant knowledge base documents matching the query.
Parameters:
similarity_threshold(float): Minimum similarity score to consider (e.g., 0.2).keywords_similarity_weight(float): Weight for keyword similarity in scoring (e.g., 0.3).top_n(int): Number of top results to return (e.g., 6).top_k(int): Max number of candidates to consider (e.g., 1024).rerank_id(string): Optional reranking model id (empty string if unused).empty_response(string): Message if no results found.kb_ids(array of string): Knowledge base identifiers to search within.
Upstream:
categorize:0Downstream:
generate:0Usage Example:
{ "component_name": "Retrieval", "params": { "similarity_threshold": 0.2, "keywords_similarity_weight": 0.3, "top_n": 6, "top_k": 1024, "rerank_id": "", "empty_response": "Nothing found in dataset", "kb_ids": ["1a3d1d7afb0611ef9866047c16ec874f"] } }Implementation Detail:
Combines semantic similarity and keyword matching weighted scoring to retrieve relevant documents.
Supports reranking but currently disabled (
rerank_idempty).Returns top-N documents to the next component.
Note: The knowledge base is identified by an ID string and likely stored externally.
5. Generate:0
Type: AI Agent / Language Model Component
Purpose: Generates a detailed response by summarizing retrieved knowledge base content.
Parameters:
llm_id(string): Identifier of the language model used (e.g.,"deepseek-chat").sys_prompt(string): System prompt guiding the assistant to summarize knowledge base content, instructing fallback phrasing if no relevant content, and to consider chat history.temperature(float): Controls response randomness (e.g., 0.2).
Upstream:
retrieval:0Downstream:
message:1Usage Example:
{ "component_name": "Agent", "params": { "llm_id": "deepseek-chat", "sys_prompt": "You are an intelligent assistant. Please summarize the content of the knowledge base to answer the question...{retrieval:0@formalized_content}...", "temperature": 0.2 } }Implementation Detail:
Constructs an answer based on retrieved KB content formatted in
{retrieval:0@formalized_content}placeholder.Enforces inclusion of a fallback sentence if no relevant info found.
Incorporates chat history for context.
Note: This component leverages an LLM prompt engineering approach to generate detailed, context-aware replies.
6. Message:1 (Generated Response)
Type: Message Component
Purpose: Sends the AI-generated answer back to the user.
Parameters:
content(array of string): Contains placeholders for the generated output (e.g.,["{generate:0@content}"]).
Upstream:
generate:0Downstream: None
Usage Example:
{ "component_name": "Message", "params": { "content": ["{generate:0@content}"] } }Description: This component acts as the final step, delivering the AI-generated answer as output.
Workflow Summary
Begin sends a greeting.
Categorize classifies the user query.
If product-related:
Retrieval fetches relevant KB data.
Generate creates a detailed answer using the KB and chat history.
Message (1) sends the generated answer.
If not product-related:
Message (0) sends a fallback response.
Interaction with Other System Parts
Knowledge Base (KB):
Theretrieval:0component uses an external knowledge base identified by"1a3d1d7afb0611ef9866047c16ec874f". This KB stores documents indexed for semantic and keyword search.Language Models:
Thecategorize:0andgenerate:0components rely on a language model identified as"deepseek-chat". This suggests integration with an LLM backend service for classification and generation tasks.User Input & System Globals:
The file defines global variables likesys.query,sys.user_id, andsys.conversation_turnswhich provide context to components, likely injected at runtime.Pipeline Orchestration:
The DAG structure formed byupstreamanddownstreamarrays governs the flow of data and control, enabling modular and extensible conversational AI pipelines.
Important Implementation Details
Categorization routing: Uses LLM-based classification to decide which path to take, enabling context-aware routing of queries.
Hybrid retrieval scoring: Combines semantic similarity and keyword weights, which balances precision and recall in document search.
Prompt engineering: The system prompt in
generate:0explicitly instructs the agent to handle irrelevant KB content gracefully, improving robustness.Placeholder usage: Components pass data using structured placeholder syntax, e.g.,
{generate:0@content}, facilitating dynamic data injection.Extensibility: The modular design allows easy replacement or addition of components like new message handlers or retrieval strategies.
Visual Diagram
flowchart TD
Begin["Begin\n(prologue: 'Hi there!')"] --> Categorize["Categorize:0\n(llm_id: deepseek-chat)"]
Categorize -->|product_related| Retrieval["Retrieval:0\n(sim_threshold:0.2,\nkeywords_weight:0.3)"]
Categorize -->|others| Msg0["Message:0\n('Sorry, I don't know. I'm an AI bot.')"]
Retrieval --> Generate["Generate:0\n(llm_id: deepseek-chat,\ntemperature: 0.2)"]
Generate --> Msg1["Message:1\n('{generate:0@content}')"]
Diagram Explanation:
The flowchart shows the pipeline from the initial greeting, through categorization, branching to either retrieval and generation or fallback messaging, and finally outputting the response.
Summary
This file defines a conversational AI pipeline for product-related Q&A that:
Starts with greeting.
Categorizes queries.
Retrieves knowledge base documents.
Generates detailed AI answers.
Handles out-of-scope queries gracefully.
Its modular design supports easy maintenance, extensibility, and integration with external knowledge bases and large language models. The clear dataflow ensures that user queries are processed efficiently, providing accurate and context-aware responses.