graph_prompt.py

Overview

graph_prompt.py is a utility module designed to provide detailed prompt templates for the extraction and summarization of entities and their relationships from textual documents. Its primary purpose is to guide a large language model (LLM) or any natural language processing (NLP) system in systematically identifying entities, classifying them by type, extracting descriptive attributes, and mapping relationships between entities based on contextual text input.

This file contains carefully crafted multi-step instructions encapsulated in prompt strings that structure how an LLM should analyze input text and output structured data representing entities and relationships. These prompts include examples and formatting rules to ensure consistent and comprehensive extraction.

In addition to the main extraction prompt, the file offers supplementary prompts to guide iterative refinement and summarization of entity descriptions, facilitating improved accuracy and completeness in graph extraction tasks.

Contents and Detailed Explanation

Constants

`GRAPH_EXTRACTION_PROMPT`

Type: str
Description:
This is the primary prompt template used to instruct the model to extract a knowledge graph from textual input. It defines a goal, a detailed multi-step extraction procedure, output formatting rules, and provides concrete examples to illustrate expected behavior.
Functionality:
The prompt guides the following workflow:
1. Entity Identification: Extract entities of specified types from the text, capturing name, type, and a comprehensive description.
2. Relationship Identification: Identify clearly related entity pairs and describe the relationship, including a strength score.
3. Output Formatting: Return a concatenated list of entity and relationship tuples using specified delimiters.
4. Completion: Signal the end of output with a completion delimiter.
Placeholders:
The prompt includes placeholders to be replaced at runtime:
- {entity_types}: List of entity types to search for (e.g., person, technology).
- {tuple_delimiter}: Delimiter string used to separate elements within a tuple.
- {record_delimiter}: Delimiter string used to separate each entity or relationship record in the output.
- {completion_delimiter}: String to signal the end of the output.
- {entity_types}, {input_text}: Used in the final "Real Data" section to indicate actual inputs.
Examples:
Three detailed examples are embedded in the prompt to demonstrate expected output formats and entity/relationship extraction logic for different texts and entity type sets.

Usage Example (Pseudocode):

prompt = GRAPH_EXTRACTION_PROMPT.format(
    entity_types="person, technology, mission, organization, location",
    tuple_delimiter='|',
    record_delimiter='\n',
    completion_delimiter='END'
)
# Append the user's text input and send the prompt to the language model
response = llm.generate(prompt + user_text)
# Parse the response using the specified delimiters

`CONTINUE_PROMPT`

Type: str
Description:
This prompt is intended for iterative refinement of the entity extraction process. It is used to instruct the model to add entities that were missed during the initial extraction, maintaining the same tuple format.
Usage:
Typically appended after the first extraction if a completeness check indicates missing entities.

`LOOP_PROMPT`

Type: str
Description:
A simple yes/no prompt asking if there are still entities missing after the current extraction iteration. The model is expected to respond with a single letter (Y or N).
Usage:
Used to control iterative extraction loops, deciding whether to continue prompting for more entities or stop.

`SUMMARIZE_DESCRIPTIONS_PROMPT`

Type: str
Description:
A prompt template for generating a consolidated summary of multiple descriptions for one or more entities. It instructs the model to merge all given descriptions into a single coherent narrative, resolving contradictions if any, and writing in third person.
Placeholders:
- {language}: Output language for the summary.
- {entity_name}: Name(s) of the entity or group of entities.
- {description_list}: List of textual descriptions to be summarized.

Usage Example (Pseudocode):

summary_prompt = SUMMARIZE_DESCRIPTIONS_PROMPT.format(
    language="English",
    entity_name="Alex",
    description_list=[
        "Alex is a character who experiences frustration.",
        "Alex is observant of the dynamics among other characters."
    ]
)
summary = llm.generate(summary_prompt)
# summary will be a coherent merged description of Alex

Implementation Details

The file does not contain executable code, classes, or functions but serves as a library of prompt templates for use in a larger NLP or LLM-based system.
The prompts are designed to be parameterized at runtime with actual entity types, delimiters, and input text, supporting flexible integration.
The complex examples embedded within GRAPH_EXTRACTION_PROMPT serve as in-context few-shot examples, which greatly improve the quality of output when passed to a language model.
The approach relies on structured prompt engineering to transform unstructured text into structured graph data, which can then be parsed and used for downstream applications such as knowledge graph construction, entity-relation extraction, or semantic search.

Interaction with Other Parts of the System

This file is likely used by a graph extraction or knowledge graph generation module that:
1. Collects raw textual data.
2. Uses these prompts to invoke an LLM (e.g., OpenAI GPT, Azure OpenAI, or other) to extract entities and relationships.
3. Parses the structured output (using the provided delimiters) into internal data structures representing nodes and edges of a graph.
4. Uses the continuation and loop prompts to iteratively enhance extraction completeness.
5. Applies the summarization prompt to aggregate and refine entity descriptions.
The output structured data can be consumed by graph databases, visualization tools, or further NLP pipelines.
The file references GraphRAG, indicating it is inspired by or compatible with Microsoft's GraphRAG framework for retrieval-augmented generation.

Visual Diagram

flowchart TD
    A[graph_prompt.py]
    A --> B[GRAPH_EXTRACTION_PROMPT]
    A --> C[CONTINUE_PROMPT]
    A --> D[LOOP_PROMPT]
    A --> E[SUMMARIZE_DESCRIPTIONS_PROMPT]

    B --> F["Entity Extraction Instructions"]
    B --> G["Relationship Extraction Instructions"]
    B --> H["Output Formatting"]
    B --> I["Examples"]

    E --> J["Input: Entities & Descriptions"]
    E --> K["Output: Consolidated Summary"]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333
    style C fill:#bbf,stroke:#333
    style D fill:#bbf,stroke:#333
    style E fill:#bbf,stroke:#333

Summary

graph_prompt.py is an essential prompt resource file designed for natural language understanding systems focused on graph extraction from text. It encapsulates sophisticated prompt engineering patterns with clear formatting rules, iterative refinement mechanisms, and summarization capabilities. It serves as an interface layer between raw textual inputs and structured graph outputs, enabling the construction of entity-relationship graphs that can fuel knowledge-driven applications.

End of documentation