query_analyze_prompt.py

Overview

The query_analyze_prompt.py file contains prompt templates designed for natural language processing tasks focused on keyword extraction from user queries. These prompts are intended for use with language models or AI assistants to guide them in identifying and categorizing keywords in input queries, with a particular emphasis on separating answer-type keywords, high-level concepts, and low-level details.

The file defines a dictionary PROMPTS containing string templates formatted as instructions and examples for generating keyword extraction outputs in JSON format. These prompts serve as a standardized interface to instruct language models on how to process and categorize query keywords effectively.

Detailed Explanation

`PROMPTS` Dictionary

The core data structure in this file is the PROMPTS dictionary. It holds multiple prompt templates as string values, each associated with a key that indicates its usage scenario.

1. `minirag_query2kwd` (Prompt Template)

Purpose:
This prompt instructs the assistant to identify two types of keywords from a user's query:

Answer-type keywords: Categories from a predefined "answer type pool" dictionary related to the nature of the expected answer.
Entities from the query: Specific, concrete entities or details mentioned in the query.

Format:
The output is expected in JSON with two keys:

"answer_type_keywords": An ordered list (max 3) of answer types with the highest likelihood.
"entities_from_query": A list of specific entities or details extracted from the query.

Key Features:

Provides a role and goal to contextualize the assistant’s task.
Includes detailed instructions on output format.
Contains multiple rich examples to illustrate how to map queries to answer types and entities.
Uses a placeholder {query} and {TYPE_POOL} for real data injection during runtime.

Usage Example:

Suppose the input query is:
"When was SpaceX's first rocket launch?"

And the answer type pool is provided as a dictionary of types and sample examples.

The assistant should output:

{
  "answer_type_keywords": ["DATE AND TIME", "ORGANIZATION", "PLAN"],
  "entities_from_query": ["SpaceX", "Rocket launch", "Aerospace", "Power Recovery"]
}

2. `keywords_extraction` (Prompt Template)

Purpose:
This prompt instructs the assistant to extract two categories of keywords from the query:

High-level keywords: Overarching concepts or themes.
Low-level keywords: Specific entities or concrete details.

Format:
Output JSON with two keys:

"high_level_keywords"
"low_level_keywords"

Key Features:

Similar role and goal description as minirag_query2kwd, but focusing on thematic vs. specific keywords instead of answer types.
Accepts a placeholder {examples} to inject example outputs dynamically.
Also uses {query} placeholder for runtime query injection.

Usage Example:

For the query:
"What is the role of education in reducing poverty?"

The expected output could be:

{
  "high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
  "low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
}

3. `keywords_extraction_examples` (List of Strings)

Purpose:
A collection of example outputs demonstrating the expected JSON format and keyword classification for the keywords_extraction prompt.

Contained Examples:

Example queries with corresponding high-level and low-level keyword outputs.
Intended to be injected dynamically into the keywords_extraction prompt under the {examples} placeholder.

Usage:
This list enables reuse of examples without cluttering the main prompt, facilitating maintainable and modular prompt construction.

Important Implementation Details

Prompt Engineering: The file is focused on prompt engineering for Large Language Models (LLMs), using detailed instructions and example-driven learning to steer model output.
JSON Output: All outputs are carefully formatted as JSON to enable easy parsing and integration into downstream systems.
Separation of Keyword Types: The prompts emphasize differentiating types of keywords based on their semantic roles (e.g., answer type, entities, high-level concepts, low-level details).
Example-Driven Approach: Rich and varied examples cover different query domains and keyword classifications, improving model understanding and performance.
Placeholders for Runtime Data: Prompts include placeholders ({query}, {TYPE_POOL}, {examples}) to be replaced dynamically when generating prompts for specific queries.

Interaction with Other System Components

Prompt Usage: This file is primarily consumed by components that generate prompts for LLMs. It is likely imported by modules responsible for query analysis, natural language understanding, or information retrieval.
Integration with Language Models: The prompt templates are inputs to language model APIs or frameworks that perform query keyword extraction.
Downstream Processing: The JSON outputs generated by these prompts are parsed and utilized by other system parts, such as knowledge retrieval engines, search indexers, or response generation modules.
Extensibility: The separation of prompt templates and examples supports easy extension or customization for different domains or tasks.

Mermaid Diagram

Below is a flowchart representing the structure and relationships between the main functions (prompts) in this utility file:

flowchart TD
    A[PROMPTS Dictionary] --> B["minirag_query2kwd"]
    A --> C["keywords_extraction"]
    A --> D["keywords_extraction_examples"]

    B --> E["Uses placeholders: {query}, {TYPE_POOL}"]
    C --> F["Uses placeholders: {query}, {examples}"]
    D --> G["Provides example outputs for 'keywords_extraction'"]

Summary

query_analyze_prompt.py is a utility file containing prompt templates for keyword extraction tasks.
It defines detailed instructions and examples to guide AI assistants in extracting answer-type keywords, high-level keywords, and low-level keywords from queries.
The prompts produce structured JSON outputs, facilitating integration with other systems.
The file is crucial in preparing inputs for language models in query analysis pipelines.
Its modular design with placeholders and example lists supports flexible and maintainable prompt construction.

End of Documentation for query_analyze_prompt.py