graph_extractor.py


Overview

The graph_extractor.py module provides functionality to extract unipartite graphs from textual content using large language model (LLM)-based entity extraction techniques. It leverages prompt engineering and iterative querying of an LLM to identify entities and their relationships within text chunks, and then constructs graph representations (using NetworkX) from the extracted data.

This module is designed as part of a larger system for information retrieval and graph-based knowledge representation, specifically interfacing with the graphrag project and related components such as Extractor and LLM invocation wrappers.


Classes

GraphExtractionResult

A simple data container class using Python's dataclass decorator.

@dataclass
class GraphExtractionResult:
    output: nx.Graph
    source_docs: dict[Any, Any]
result = GraphExtractionResult(output=my_graph, source_docs=my_docs)
print(result.output.nodes)

GraphExtractor(Extractor)

Extends the Extractor abstract/base class to specialize in extracting graph structures from input text using an LLM.

Initialization

def __init__(
    self,
    llm_invoker: CompletionLLM,
    language: str | None = "English",
    entity_types: list[str] | None = None,
    example_number: int = 2,
    max_gleanings: int | None = None,
)

Method: _process_single_content

async def _process_single_content(self, chunk_key_dp: tuple[str, str], chunk_seq: int, num_chunks: int, out_results)
# Assuming async context and an instance `extractor` of GraphExtractor
results = []
await extractor._process_single_content(("chunk1", "Some text data..."), 1, 5, results)
print(results)

Important Implementation Details


Interactions with Other System Components


Mermaid Class Diagram

classDiagram
    class GraphExtractionResult {
        +output: nx.Graph
        +source_docs: dict[Any, Any]
    }

    class GraphExtractor {
        -_max_gleanings: int
        -_example_number: int
        -_entity_extract_prompt: str
        -_context_base: dict
        -_continue_prompt: str
        -_if_loop_prompt: str
        -_left_token_count: float
        +__init__(llm_invoker, language, entity_types, example_number, max_gleanings)
        +_process_single_content(chunk_key_dp, chunk_seq, num_chunks, out_results)
    }

    GraphExtractor --|> Extractor
    GraphExtractor ..> CompletionLLM : uses
    GraphExtractor ..> "networkx.Graph" : creates

Summary

The graph_extractor.py file is a critical component that converts raw textual data into structured graph representations by leveraging advanced LLM prompting and iterative refinement techniques. Its asynchronous design and modular prompts enable scalable and customizable extraction workflows, making it suitable for knowledge graph construction, entity-relationship extraction, and downstream graph analytics in the broader graphrag ecosystem.