graph_prompt.py

Overview

graph_prompt.py is a configuration and prompt definition file designed to support entity and relationship extraction from text documents for Knowledge Graph or Retrieval-Augmented Generation (RAG) systems. It primarily contains a collection of structured prompt templates and example data that guide language models or extraction algorithms to identify entities, their attributes, and relationships from unstructured text.

The file defines:

Constants for default delimiters and entity types.
Detailed natural language prompt templates for entity extraction tasks.
Multiple example prompts demonstrating the expected input-output format.
Supplementary prompt templates for summarization, continuation, and validation of entity extraction.
Response templates designed for downstream applications like RAG systems or keyword extraction.

This file does not contain executable logic or functions but serves as a centralized repository of prompt templates and examples facilitating consistent and accurate information extraction workflows.

Detailed Explanation of Contents

Constants

PROMPTS (dict[str, Any])
A dictionary storing all prompt templates, default configurations, and example data.
Default keys within PROMPTS:
- "DEFAULT_LANGUAGE": Default language for output, set to "English".
- "DEFAULT_TUPLE_DELIMITER": The delimiter string <|> used to separate tuple components in outputs.
- "DEFAULT_RECORD_DELIMITER": The delimiter string ## used to separate records/entities/relationships.
- "DEFAULT_COMPLETION_DELIMITER": The delimiter string <|COMPLETE|> marking the end of extraction output.
- "DEFAULT_ENTITY_TYPES": A list of entity types commonly used: ["organization", "person", "geo", "event", "category"].
- "DEFAULT_USER_PROMPT": A placeholder string "n/a" for user prompt scenarios.

Key Prompt Templates

1. `entity_extraction`

A comprehensive natural language prompt guiding the extraction of entities and relationships from text. It instructs the extraction system to:

Identify entities of specified types in the input text.
Extract entity attributes (name, type, description) strictly based on text content.
Identify and describe relationships between entities, including strength and keywords.
Extract high-level content keywords summarizing the document.
Format output using specified tuple and record delimiters.
Output in the designated language.
Include an example usage section and real data section placeholders for dynamic content insertion.

Parameters (placeholders expected to be formatted at runtime):

{language}: Language to use for output.
{entity_types}: List of entity types to extract.
{tuple_delimiter}: String delimiter for tuple fields.
{record_delimiter}: String delimiter between records.
{completion_delimiter}: String marking completion of output.
{examples}: Example extraction outputs.
{input_text}: The text document to be processed.

Usage example snippet:

prompt_text = PROMPTS["entity_extraction"].format(
    language="English",
    entity_types="person, organization, location",
    tuple_delimiter=PROMPTS["DEFAULT_TUPLE_DELIMITER"],
    record_delimiter=PROMPTS["DEFAULT_RECORD_DELIMITER"],
    completion_delimiter=PROMPTS["DEFAULT_COMPLETION_DELIMITER"],
    examples=PROMPTS["entity_extraction_examples"][0],
    input_text="Sample input text here..."
)

2. `entity_extraction_examples`

A list of string examples demonstrating expected entity and relationship extraction output formats given sample texts. Each example includes:

Entity types expected.
Input text snippet.
Output formatted as tuples joined by delimiters, illustrating how entities and relationships should be extracted and described.

These examples serve as:

Training or prompt engineering references.
Templates for validating model output conformity.
Documentation for users on expected input/output format.

3. `summarize_entity_descriptions`

A prompt template for summarizing or consolidating multiple entity descriptions into a single coherent summary, resolving contradictions if any.

Parameters:

{language}: Output language.
{entity_name}: Target entity or group of entities.
{description_list}: List of corresponding descriptions to summarize.

Usage context:
Useful for merging fragmented or multiple partial entity descriptions into one comprehensive summary.

4. `entity_continue_extraction`

A prompt for continuing entity and relationship extraction when the initial extraction missed some entities. It repeats the original extraction instructions but focuses on finding only the missing entities and relationships.

Usage context:
Used in iterative extraction workflows to improve recall by prompting the model to find overlooked entities.

5. `entity_if_loop_extraction`

A short prompt asking the model to answer "YES" or "NO" about whether any entities are still missing after extraction.

Usage context:
Acts as a validation or stopping criterion in iterative extraction loops.

6. `fail_response`

A fixed prompt string for cases where the system cannot answer a query due to lack of context or capability.

7. `rag_response`

A detailed prompt template for generating responses in Retrieval-Augmented Generation (RAG) systems. It instructs the assistant to:

Use provided knowledge graph and document chunk JSON data.
Respond concisely and accurately according to the knowledge base.
Follow strict formatting, language, citation, and response length guidelines.
Incorporate conversation history and user prompt context.

Parameters:

{history}: Conversation history.
{context_data}: Knowledge graph and document chunks.
{response_type}: Desired response format/length.
{user_prompt}: User's additional prompt/context.

8. `keywords_extraction`

A prompt template instructing the system to extract high-level and low-level keywords from a user query for use in RAG systems to improve document retrieval relevance.

Output format: Must be a valid JSON object containing:

{
  "high_level_keywords": [ ... ],
  "low_level_keywords": [ ... ]
}

with strict rules about content and formatting.

9. `keywords_extraction_examples`

A list of example keyword extraction inputs and outputs demonstrating the expected JSON format and keyword types.

10. `naive_rag_response`

A prompt template similar to rag_response but focused solely on document chunks (no knowledge graph) for generating concise answers.

Important Implementation Details and Algorithms

Prompt Engineering Approach:
This file exemplifies advanced prompt engineering techniques for entity extraction and summarization tasks. It uses explicit instructions, output format specifications with delimiters, and examples to guide LLMs or extraction systems.
Delimiter-Based Structured Output:
The use of tuple and record delimiters (<|>, ##) enables the extraction outputs to be parsed easily into structured data entities and relationships, crucial for downstream graph construction.
Iterative Extraction Support:
By providing prompts like entity_continue_extraction and entity_if_loop_extraction, the system supports iterative refinement, mitigating missed entities or relationships through looped extraction.
Multi-Purpose Prompt Templates:
The file supports multiple use cases including extraction, summarization, keyword extraction, and response generation within a RAG context.
Example-Driven Instruction:
Including detailed, realistic examples helps improve model understanding and output quality.

Interaction with Other Parts of the System

This file is intended to be imported and used by:
- Entity extraction modules that prepare prompts to send to language models or extraction engines.
- RAG systems that generate answers based on extracted knowledge graphs and document chunks.
- Parsing utilities that interpret the structured output delimited strings into data structures.
- Iterative extraction controllers that manage repeated extraction cycles based on feedback prompts.
It acts as a central prompt repository, enabling consistent prompt usage across the system.
It does not perform extraction or response generation itself; rather, it supports those components by providing them with well-formatted prompt templates and examples.

Visual Diagram: Class and Data Structure Overview

Since this file primarily contains prompt templates stored in a dictionary without classes or functions, the most appropriate visualization is a flowchart showing the main prompts and their relationships in the extraction and response workflows.

flowchart TD
    A[PROMPTS dictionary] --> B["entity_extraction"]
    A --> C["entity_extraction_examples"]
    A --> D["summarize_entity_descriptions"]
    A --> E["entity_continue_extraction"]
    A --> F["entity_if_loop_extraction"]
    A --> G["fail_response"]
    A --> H["rag_response"]
    A --> I["keywords_extraction"]
    A --> J["keywords_extraction_examples"]
    A --> K["naive_rag_response"]

    B --> C  %% Examples used in entity_extraction
    E --> B  %% Continuation prompt relates to main extraction
    F --> E  %% Loop check relates to continuation prompt
    H --> I  %% RAG response uses keywords extraction indirectly

Summary

graph_prompt.py is a foundational prompt configuration file that empowers entity and relationship extraction pipelines, knowledge graph construction, and RAG-based question answering systems through a carefully curated set of natural language prompts and examples. It standardizes how extraction instructions are conveyed to language models and how output should be formatted, facilitating robust and interpretable downstream processing.

graph_prompt.py

Overview

Detailed Explanation of Contents

Constants

Key Prompt Templates

1. entity_extraction

2. entity_extraction_examples

3. summarize_entity_descriptions

4. entity_continue_extraction

5. entity_if_loop_extraction

6. fail_response

7. rag_response

8. keywords_extraction

9. keywords_extraction_examples

10. naive_rag_response