graph_extractor.py


Overview

The graph_extractor.py file provides an implementation for extracting unipartite graphs from textual content using large language models (LLMs). It defines classes and methods that convert raw input text into graph representations, identifying entities as nodes and their relationships as edges. This extraction is performed via iterative prompting and completion cycles with an LLM, allowing the system to "glean" multiple entity relations from a single text chunk.

This file is part of a broader system that leverages LLMs for information extraction tasks, specifically focusing on graph structure extraction from unstructured text. It integrates with other components such as prompt templates, tokenization utilities, and LLM invocation abstractions to achieve its functionality.


Classes and Functions

GraphExtractionResult

@dataclass
class GraphExtractionResult:
    output: nx.Graph
    source_docs: dict[Any, Any]

GraphExtractor

class GraphExtractor(Extractor):
    ...

__init__

def __init__(
    self,
    llm_invoker: CompletionLLM,
    language: str | None = "English",
    entity_types: list[str] | None = None,
    tuple_delimiter_key: str | None = None,
    record_delimiter_key: str | None = None,
    input_text_key: str | None = None,
    entity_types_key: str | None = None,
    completion_delimiter_key: str | None = None,
    join_descriptions=True,
    max_gleanings: int | None = None,
    on_error: ErrorHandlerFn | None = None,
):

_process_single_content

async def _process_single_content(self, chunk_key_dp: tuple[str, str], chunk_seq: int, num_chunks: int, out_results):

Important Implementation Details


Interaction with Other System Components


Visual Diagram: Class Diagram for GraphExtractor and GraphExtractionResult

classDiagram
    class GraphExtractionResult {
        +output: nx.Graph
        +source_docs: dict[Any, Any]
    }

    class GraphExtractor {
        -_llm: CompletionLLM
        -_join_descriptions: bool
        -_tuple_delimiter_key: str
        -_record_delimiter_key: str
        -_entity_types_key: str
        -_input_text_key: str
        -_completion_delimiter_key: str
        -_entity_name_key: str
        -_input_descriptions_key: str
        -_extraction_prompt: str
        -_summarization_prompt: str
        -_loop_args: dict[str, Any]
        -_max_gleanings: int
        -_on_error: ErrorHandlerFn
        +__init__(llm_invoker, language, entity_types, tuple_delimiter_key, record_delimiter_key, input_text_key, entity_types_key, completion_delimiter_key, join_descriptions, max_gleanings, on_error)
        +_process_single_content(chunk_key_dp, chunk_seq, num_chunks, out_results)
    }

    GraphExtractor --|> Extractor

Summary

The graph_extractor.py file implements a specialized extractor that converts text into graph structures using iterative prompting of language models. It manages prompt customization, multi-turn extraction loops, token counting, and error handling. Results are returned as NetworkX graphs annotated with source document metadata. This component fits into a larger system for advanced information extraction leveraging LLMs and prompt engineering.


End of Documentation for graph_extractor.py