sql_assistant.json

Overview

This JSON file defines the configuration and workflow for SQL Assistant, an AI-powered tool designed to convert natural language questions from business users into executable SQL queries. The assistant leverages a large language model (LLM) to interpret plain-English queries, generate syntactically correct MySQL statements, execute them on a connected database, and return results quickly.

The SQL Assistant is structured as a directed graph of components, each responsible for a specific task in the pipeline — from retrieving contextual knowledge, generating SQL, executing it, to presenting the results. This configuration appears intended for a workflow orchestration system or an AI platform that supports multi-component interaction.

Components and Workflow

The file's core is its DSL (Domain Specific Language) describing components (nodes) and their data flow (edges). Below are detailed explanations of each main component, their parameters, and how they fit into the overall workflow.

1. Begin

Component Name: Begin
Purpose: Entry point of the conversation/workflow. It initializes the session with a prologue message to greet users and prompt their input.
Parameters:
- enablePrologue (boolean): Enables the initial greeting message.
- mode (string): Set to "conversational", indicating interaction style.
- prologue (string): The greeting text: "Hi! I'm your SQL assistant. What can I do for you?"
Usage: Starts the pipeline by capturing user input (sys.query) and triggers subsequent retrieval components.

2. Retrieval Components

These components query knowledge bases (KBs) to provide context for the SQL generation.

Common Parameters:
- kb_ids: List of knowledge base IDs to query.
- query: Bound to sys.query, the user's natural language input.
- top_k: Maximum number of documents to retrieve (1024).
- top_n: Number of top documents to use (8).
- similarity_threshold: Minimum similarity score (0.2).
- keywords_similarity_weight: Weighting for keyword similarity (0.7).
- outputs.formalized_content: Output string containing retrieved textual content.
Individual Retrievals:
1. Schema Retrieval
  - ID: Retrieval:HappyTiesFilm
  - Description: Retrieves formalized database schema information to inform the SQL generation.
  - KB ID: "ed31364c727211f0bdb2bafe6e7908e6"
2. Question to SQL Samples
  - ID: Retrieval:SmartNewsHammer
  - Description: Retrieves examples of natural language questions mapped to SQL statements to guide the model.
  - KB ID: "0f968106727311f08357bafe6e7908e6"
3. Database Description
  - ID: Retrieval:SweetDancersAppear
  - Description: Provides descriptive information about tables and fields in the database.
  - KB ID: "4ad1f9d0727311f0827dbafe6e7908e6"
Usage: These retrievals feed their outputs into the SQL generation agent to give it context on schema, examples, and descriptions.

3. Agent: SQL Generator

Component Name: Agent
ID: Agent:WickedGoatsDivide
Purpose: The core text-to-SQL AI model that converts the user query and contextual knowledge into a syntactically valid MySQL query.
Parameters:
- llm_id: "qwen-max@Tongyi-Qianwen" — the identifier for the language model used.
- max_tokens: 256 — maximum tokens output for SQL.
- max_retries: 3 — retries on failure.
- max_rounds: 5 — maximum dialogue rounds.
- message_history_window_size: 12 — context window size.
- temperature: 0.1 — controls randomness (disabled here).
- frequency_penalty: 0.7
- presence_penalty: 0.4
- prompts: Contains a user role prompt embedding:
  - User's natural language query.
  - Schema information from the Retrieval:HappyTiesFilm.
  - Sample Q&A from Retrieval:SmartNewsHammer.
  - Database description from Retrieval:SweetDancersAppear.
- sys_prompt: A detailed system prompt instructing the agent to:
  - Produce only a single, syntactically correct MySQL query.
  - Avoid any code fences, commentary, or explanations.
  - Follow examples that showcase typical SQL queries.
Outputs:
- content (string): The generated SQL statement.
Behavior:
- Receives combined inputs from retrieval nodes.
- Generates SQL query based on user input and retrieved context.
Usage Example:
- Input: "Show me last quarter’s top 10 products by revenue"
- Output: A MySQL query selecting top 10 products by revenue for last quarter.

4. ExeSQL: SQL Executor

Component Name: ExeSQL
ID: ExeSQL:TiredShirtsPull
Purpose: Executes the SQL statement generated by the Agent against a MySQL database and returns the results.
Parameters:
- db_type: "mysql"
- host, database, username, password, port: Connection details (some fields empty or masked).
- max_records: 1024 — maximum number of records to return.
- sql: Bound dynamically to the Agent's output (Agent:WickedGoatsDivide@content).
Outputs:
- formalized_content: String representation of query results.
- json: An array of objects representing query results as JSON.
Usage:
- Takes the SQL query from the Agent.
- Connects to the configured MySQL database.
- Runs the query and captures results.

5. Message: Result Presenter

Component Name: Message
ID: Message:ShaggyMasksAttend
Purpose: Delivers the results of the executed SQL query back to the user.
Parameters:
- content: Uses the formalized content output from ExeSQL:TiredShirtsPull.
Usage:
- Presents the query results as the final output in the conversational flow.

Workflow Summary

User Input: The user initiates the conversation at the Begin node, entering a natural language query.
Knowledge Retrieval: Three Retrieval components fetch:
- Database schema.
- Sample Q&A for question-to-SQL mapping.
- Database/table descriptions.
SQL Generation: The Agent receives the user query plus all retrieval outputs and generates a precise MySQL query.
SQL Execution: The ExeSQL component runs the generated SQL against the connected database.
Result Display: The Message component presents the query results back to the user.

Important Implementation Details

Prompt Engineering: The Agent uses a carefully crafted system prompt to enforce rules such as outputting only the SQL statement without comments or code fences.
Retrieval-Based Context: Multiple retrievals supply relevant knowledge bases to aid the language model's understanding of schema and examples, improving SQL accuracy.
Error Handling: Agent supports retry logic (max_retries) and rate limiting (delay_after_error) to handle failures gracefully.
Security: Database credentials are embedded in the configuration but appear masked or placeholders in this file. Password is visible here but should be managed securely in real deployments.
Output Types: The system supports both string and JSON output formats for query results, facilitating flexible downstream consumption.

Interaction with Other System Parts

Knowledge Bases: The retrieval components depend on external knowledge bases identified by kb_ids which contain schema, samples, and descriptions. These KBs need to be maintained and updated independently.
Large Language Model Service: The Agent connects to a remote LLM (qwen-max@Tongyi-Qianwen) for text-to-SQL conversion — this requires network connectivity and credentials.
Database Server: The ExeSQL component must connect to a MySQL database server with proper credentials and network access.
User Interface: While not in this file, the system expects a frontend or conversational UI to interact with the Begin and Message components, presenting prompts and results to users.

Visual Diagram: Component Flowchart

flowchart LR
    Begin["Begin\n(User Input)"]
    RetrievalSchema["Retrieval: Schema"]
    RetrievalSamples["Retrieval: Question to SQL Samples"]
    RetrievalDesc["Retrieval: Database Description"]
    Agent["Agent: SQL Generator\n(LLM Text-to-SQL)"]
    ExeSQL["ExeSQL: SQL Executor\n(MySQL DB)"]
    Message["Message: Result Presenter"]

    Begin --> RetrievalSchema
    Begin --> RetrievalSamples
    Begin --> RetrievalDesc

    RetrievalSchema --> Agent
    RetrievalSamples --> Agent
    RetrievalDesc --> Agent

    Agent --> ExeSQL
    ExeSQL --> Message

Summary

The sql_assistant.json file defines a modular, knowledge-augmented AI workflow that enables natural language to SQL translation with immediate execution and result display. It orchestrates multiple retrievals, a powerful language model, and SQL execution components in a streamlined flow designed for conversational business intelligence applications.

This file is a blueprint for deploying an interactive SQL assistant that democratizes data querying for non-technical users by bridging natural language understanding and database querying capabilities.