web_search_assistant.json
Overview
web_search_assistant.json defines a chat assistant template designed to answer user queries by integrating information from both a knowledge base and web searches. This integration facilitates more comprehensive, accurate, and up-to-date responses. The assistant architecture orchestrates multiple specialized agents and components that handle question refinement, web-based search, knowledge base retrieval, and answer generation.
The file is structured as a JSON object describing the components, their configurations, and their interconnections within a DSL (Domain-Specific Language) framework, allowing the assistant to function as a conversational system with layered processing workflows.
Key goals:
Refine ambiguous or incomplete user queries to align with terminology in the knowledge base.
Extract keywords and perform multi-source web searches to gather fresh information.
Retrieve relevant content from configured knowledge bases.
Organize and synthesize information from all sources into a final answer in markdown format.
Provide a seamless conversational experience starting from a friendly greeting.
Components, Functions, and Methods
The file is declarative, describing components (agents, messages, retrieval modules) rather than traditional classes or functions. Below is an explanation of each major component, their parameters, and their role in the system.
1. Begin Component (begin)
Type:
BeginPurpose: Entry point of the conversation.
Parameters:
enablePrologue: Enables the initial greeting.prologue: Greeting message to the user.mode: Conversational mode.
Functionality: Starts the interaction with a prompt:
"Hi! I'm your web search assistant. What do you want to search today?"Downstream: Passes user input to
Agent:ThreePathsDecidefor question refinement.
2. Agent:ThreePathsDecide (Question Refinement Agent)
Type:
AgentPurpose: Refines user queries to make them more specific and aligned with knowledge base terminology.
Parameters:
llm_id:"deepseek-chat@DeepSeek"- specifies the language model powering this agent.max_retries,max_rounds,max_tokens: Controls for LLM interaction.temperature: 0.1 (low randomness for stable output).prompts: Takes raw user query{sys.query}as input.sys_prompt: Instructions for rewriting ambiguous/incomplete questions. For example:User: What's RAGFlow? Assistant: RAGFlow is xxx. User: How to deloy it? Refine it: How to deploy RAGFlow?
Upstream:
beginDownstream:
Agent:WildGoatsRuleandRetrieval:WarmTimesRunOutput: Provides a refined question string used by downstream components.
3. Agent:WildGoatsRule (Search-Driven Information Agent)
Type:
AgentPurpose: Extracts keywords from refined question, performs web searches, and answers based on search results.
Parameters:
llm_id:"deepseek-chat@DeepSeek"max_retries,max_rounds: 3 retries and 2 rounds to allow iterative searching.temperature: 0.1tools: Integrates multiple search APIs/tools:TavilySearch
TavilyExtract
Google Search API
Bing Web Search API
DuckDuckGo
Wikipedia API
Workflow Instructions:
Extract 3 specific keywords (nouns/proper nouns).
Use these keywords to search on integrated search engines.
Answer solely based on retrieved information citing sources.
Avoid guessing or fabricating answers.
Never show keywords in final answers.
Upstream:
Agent:ThreePathsDecideDownstream:
Agent:SmartSchoolsCrossOutput: Web search result content for further processing.
4. Retrieval:WarmTimesRun (Knowledge Base Retrieval Component)
Type:
RetrievalPurpose: Retrieves relevant information from configured knowledge bases based on the refined question.
Parameters:
query:{Agent:ThreePathsDecide@content}(refined question).top_k: 1024 (number of candidate documents to consider).top_n: 8 (number of top results to return).similarity_threshold: 0.2 (minimum similarity for retrieval).Other parameters control language cross-matching, keyword similarity weighting, and use of knowledge graphs.
Upstream:
Agent:ThreePathsDecideDownstream:
Agent:SmartSchoolsCrossOutput: Formalized content from knowledge base documents.
5. Agent:SmartSchoolsCross (Answer Organizer Agent)
Type:
AgentPurpose: Combines inputs from user query, refined question, web search results, and knowledge base retrieval results to generate the final answer.
Parameters:
llm_id:"deepseek-chat@DeepSeek"max_retries: 3max_rounds: 1temperature: 0.1prompts: Combines multiple inputs into a single prompt template:User's query: {sys.query} Refined question: {Agent:ThreePathsDecide@content} Web search result: {Agent:WildGoatsRule@content} Retrieval result: {Agent:WarmTimesRun@content}sys_prompt: Instructions to act as "Answer Organizer," generating markdown format answers without fabricating information.
Upstream:
Agent:WildGoatsRule,Retrieval:WarmTimesRunDownstream:
Message:ShaggyRingsCrashOutput: Final answer content.
6. Message:ShaggyRingsCrash (Message Component)
Type:
MessagePurpose: Delivers the final formatted answer to the user interface.
Parameters:
content: Output fromAgent:SmartSchoolsCross.
Upstream:
Agent:SmartSchoolsCrossDownstream: None (terminal node).
Important Implementation Details and Algorithms
Multi-Agent Pipeline: The system uses a pipeline of specialized agents, each with an explicit role:
Question Refinement: Improves input clarity and knowledge base alignment.
Search-Driven Agent: Extracts keywords and performs multi-source web searches with strict rules on keyword selection and answer generation.
Knowledge Base Retrieval: Performs semantic search on configured corpora to extract relevant documents.
Answer Organizer: Synthesizes all input data streams into a coherent final answer.
Keyword Extraction Algorithm: The Search Agent extracts exactly three keywords that are:
Most specific nouns or proper nouns.
Core concepts.
Unbiased.
This is implemented as part of the system prompt for the agent, ensuring consistent keyword extraction without user exposure.
Search Tools Integration: Multiple search APIs are integrated with configurable parameters such as API keys, language, country, and result limits to maximize coverage and reliability.
Error Handling and Retries: Agents have
max_retriesanddelay_after_errorparameters to ensure robustness.LLM Prompt Design: System prompts are carefully crafted to enforce role-specific behavior and output format (e.g., markdown, citation style).
Message History Window: Agents keep a message history window (size 12) to maintain conversational context.
Output Format: Answers are in markdown format and include citation using
[Source #]notation.
Interaction with Other Parts of the System/Application
User Input: The conversation starts with the
begincomponent prompting the user.Conversation State: The system uses global variables such as
sys.query(user query),sys.conversation_turns, andsys.user_idto track session state.Knowledge Base Configuration: The
Retrievalcomponent depends on knowledge base IDs (kb_ids) which must be set up separately in the system for retrieval to function.External APIs: Web search agents rely on external APIs (Google, Bing, DuckDuckGo, Wikipedia, TavilySearch) that require API keys and network access.
Output Delivery: The final answer is sent downstream to a
Messagecomponent that displays the answer to the user, completing the request cycle.Visual/UX Layer: The JSON includes position data and node types for visual graph editors or UI rendering tools that allow visualization and editing of the assistant's workflow.
Usage Example
User Query: "你好" (Hello)
Workflow:
begincomponent receives the query and forwards it.Agent:ThreePathsDeciderefines the question (e.g., disambiguates or expands).Refined question goes to:
Agent:WildGoatsRulewhich extracts keywords and runs web search.Retrieval:WarmTimesRunwhich performs knowledge base retrieval.
Both results go to
Agent:SmartSchoolsCrosswhich synthesizes information into a final markdown answer.Answer is sent to
Message:ShaggyRingsCrashand displayed to the user.
Visual Diagram
flowchart TD
Begin["Begin\n(Start Conversation)"]
RefineQ["Agent:ThreePathsDecide\n(Question Refinement)"]
SearchAgent["Agent:WildGoatsRule\n(Search-Driven Agent)"]
Retrieval["Retrieval:WarmTimesRun\n(Knowledge Base Retrieval)"]
AnswerOrg["Agent:SmartSchoolsCross\n(Answer Organizer)"]
Message["Message:ShaggyRingsCrash\n(Display Answer)"]
Begin --> RefineQ
RefineQ --> SearchAgent
RefineQ --> Retrieval
SearchAgent --> AnswerOrg
Retrieval --> AnswerOrg
AnswerOrg --> Message
Diagram Explanation:
The user starts interaction at Begin.
The query is refined by Refine Question Agent.
The refined question branches to two parallel processes:
Search Agent conducts keyword extraction and web search.
Retrieval fetches knowledge base documents.
Both results feed into the Answer Organizer, which synthesizes and formats the final response.
The Message component outputs the answer to the user.
Summary
web_search_assistant.json is a comprehensive configuration file defining a multi-agent chat assistant that intelligently combines refined user queries, web search results, and knowledge base retrieval to deliver accurate and well-organized answers. It encapsulates a robust workflow with modular components, well-defined roles, and extensible search tool integrations, making it a powerful template for web-enhanced conversational AI applications.