community_reports_extractor.py

Overview

The community_reports_extractor.py file defines a specialized extractor for generating descriptive and structured reports about communities detected within a graph. Leveraging the Leiden algorithm for community detection, it processes a NetworkX graph, extracts entity and relationship data for each community, and utilizes a large language model (LLM) to generate narrative reports summarizing the community characteristics, findings, and ratings.

This file is an integral part of a graph analysis system that applies advanced community detection and natural language generation to produce human-readable and machine-structured community insights. It depends on external modules such as graphrag for graph operations and community detection, and a chat-based LLM interface for report generation.

Classes and Functions

`CommunityReportsResult`

A simple data class representing the output of the community reports extraction process.

Attributes

Attribute	Type	Description
`output`	`list[str]`	List of textual community reports.
`structured_output`	`list[dict]`	List of structured JSON-like community data.

Usage

result = CommunityReportsResult(
    output=["# Community 1 Report\n...", "# Community 2 Report\n..."],
    structured_output=[{...}, {...}]
)

`CommunityReportsExtractor`

An extractor class derived from Extractor that generates community reports by analyzing a graph's community structure and invoking a large language model for text generation.

Initialization

def __init__(
    self,
    llm_invoker: CompletionLLM,
    max_report_length: int | None = None,
)

Parameters

llm_invoker (CompletionLLM): An instance of a chat-based large language model interface used to generate text.
max_report_length (int | None, optional): Maximum length of the generated report in tokens. Defaults to 1500 if not provided.

Description

Initializes the extractor with the provided LLM invoker and sets the extraction prompt and maximum report length.

Calling the Extractor Instance

async def __call__(self, graph: nx.Graph, callback: Callable | None = None) -> CommunityReportsResult:

Parameters

graph (nx.Graph): The NetworkX graph object representing entities and their relationships.
callback (Callable | None, optional): Optional callback function to receive progress updates. The callback receives messages as keyword argument msg.

Returns

CommunityReportsResult: Contains both textual and structured outputs for all detected communities.

Description

Graph Preparation
Assigns a "rank" property to each node based on its degree.
Community Detection
Runs the Leiden algorithm on the graph to detect communities at multiple levels.
Community Report Extraction
For each community:
- Filters communities with fewer than 2 nodes.
- Constructs dataframes for entities and relationships including descriptions.
- Creates a prompt by injecting CSV representations of entities and relations.
- Calls the LLM asynchronously under rate limiting and timeout constraints to generate a JSON-formatted report.
- Parses and validates the generated JSON for required fields: title, summary, findings, rating, and rating_explanation.
- Annotates the graph with community info.
- Accumulates textual and structured reports.
- Invokes the callback with progress updates.
Concurrency and Timeouts
Uses Trio async nursery and timeout decorators to parallelize and manage long-running LLM calls safely.

Usage Example

extractor = CommunityReportsExtractor(llm_invoker=my_llm)
result = await extractor(graph, callback=print)
print(result.output)  # List of textual reports
print(result.structured_output)  # List of dict reports

Internal Method: `_get_text_output`

def _get_text_output(self, parsed_output: dict) -> str:

Parameters

parsed_output (dict): A parsed JSON-like dictionary representing a single community report.

Returns

str: A formatted markdown string representing the community report.

Description

Generates a human-readable markdown report from the structured community report dictionary. The report includes:

Title as a primary header.
Summary as introductory text.
Each finding as a sub-header with explanation text.

Usage Example

text_report = extractor._get_text_output(parsed_output)
print(text_report)

Sample Output:

# Community Title

Summary text describing the community.

## Finding 1 Summary

Finding 1 explanation.

## Finding 2 Summary

Finding 2 explanation.

Important Implementation Details and Algorithms

Community Detection: Utilizes the Leiden algorithm (leiden.run) for detecting communities within the graph. This method is known for its speed and accuracy in community detection.
Data Preparation:
Entities and relationships within a community are extracted and converted to CSV format to be fed into the prompt template for the LLM.
Prompt Engineering:
Employs a predefined prompt template (COMMUNITY_REPORT_PROMPT) where variable replacements inject the CSV data. This structured prompt guides the LLM to produce JSON-formatted reports.
Asynchronous Processing:
The extraction process is asynchronous, allowing multiple community reports to be generated in parallel with concurrency management via Trio's nursery.
Timeouts and Rate Limiting:
Incorporates timeouts (timeout decorator and trio.move_on_after) and a semaphore-like chat_limiter to prevent excessive resource consumption or infinite waiting on LLM responses.
Response Cleanup and Validation:
Uses regex substitutions to clean up the LLM-generated response before parsing JSON. Then verifies the presence and types of required keys.
Graph Annotation:
Calls add_community_info2graph to annotate the original graph nodes with community-related metadata (e.g., community title).

Interaction with Other Components

LLM Interface (CompletionLLM):
The extractor depends on an LLM chat model interface to generate community reports from structured data prompts.
Graph Utilities (graphrag.general.leiden):
Uses the Leiden algorithm implementation from the graphrag library for community detection.
Graph Annotation (add_community_info2graph):
Updates the input graph with community metadata after report generation.
Utility Functions:
- perform_variable_replacements to generate prompts dynamically.
- dict_has_keys_with_types to validate JSON structure.
- num_tokens_from_string to track token usage.
- chat_limiter to limit concurrent LLM calls.
Asynchronous Concurrency (trio):
Manages async execution of community report extraction tasks.

Visual Diagram

classDiagram
    class CommunityReportsExtractor {
        -_extraction_prompt: str
        -_output_formatter_prompt: str
        -_max_report_length: int
        +__init__(llm_invoker: CompletionLLM, max_report_length: int | None)
        +__call__(graph: nx.Graph, callback: Callable | None) async CommunityReportsResult
        -_get_text_output(parsed_output: dict) str
    }

    class CommunityReportsResult {
        +output: list~str~
        +structured_output: list~dict~
    }

    CommunityReportsExtractor --> CommunityReportsResult : returns
    CommunityReportsExtractor ..> CompletionLLM : uses
    CommunityReportsExtractor ..> nx.Graph : input
    CommunityReportsExtractor ..> leiden : uses community detection
    CommunityReportsExtractor ..> add_community_info2graph : annotates graph

Summary

This file encapsulates the logic for detecting communities within a graph and generating both textual and structured reports describing these communities. It integrates advanced graph analysis, asynchronous programming, and LLM-based natural language generation into a coherent extractor component. The CommunityReportsExtractor class is designed for use in larger graph analysis or knowledge extraction pipelines where automated, detailed community insights are needed.

community_reports_extractor.py

Overview

Classes and Functions

CommunityReportsResult

Attributes

Usage

CommunityReportsExtractor

Initialization

Parameters

Description

Calling the Extractor Instance

Parameters

Returns

Description

Usage Example

Internal Method: _get_text_output

Parameters

Returns

Description

Usage Example

Important Implementation Details and Algorithms

Interaction with Other Components

Visual Diagram

Summary

`CommunityReportsResult`

`CommunityReportsExtractor`

Internal Method: `_get_text_output`