community_reports_extractor.py
Overview
The community_reports_extractor.py file defines a specialized extractor for generating descriptive and structured reports about communities detected within a graph. Leveraging the Leiden algorithm for community detection, it processes a NetworkX graph, extracts entity and relationship data for each community, and utilizes a large language model (LLM) to generate narrative reports summarizing the community characteristics, findings, and ratings.
This file is an integral part of a graph analysis system that applies advanced community detection and natural language generation to produce human-readable and machine-structured community insights. It depends on external modules such as graphrag for graph operations and community detection, and a chat-based LLM interface for report generation.
Classes and Functions
CommunityReportsResult
A simple data class representing the output of the community reports extraction process.
Attributes
Attribute | Type | Description |
|---|---|---|
|
| List of textual community reports. |
|
| List of structured JSON-like community data. |
Usage
result = CommunityReportsResult(
output=["# Community 1 Report\n...", "# Community 2 Report\n..."],
structured_output=[{...}, {...}]
)
CommunityReportsExtractor
An extractor class derived from Extractor that generates community reports by analyzing a graph's community structure and invoking a large language model for text generation.
Initialization
def __init__(
self,
llm_invoker: CompletionLLM,
max_report_length: int | None = None,
)
Parameters
llm_invoker(CompletionLLM): An instance of a chat-based large language model interface used to generate text.max_report_length(int | None, optional): Maximum length of the generated report in tokens. Defaults to 1500 if not provided.
Description
Initializes the extractor with the provided LLM invoker and sets the extraction prompt and maximum report length.
Calling the Extractor Instance
async def __call__(self, graph: nx.Graph, callback: Callable | None = None) -> CommunityReportsResult:
Parameters
graph(nx.Graph): The NetworkX graph object representing entities and their relationships.callback(Callable | None, optional): Optional callback function to receive progress updates. The callback receives messages as keyword argumentmsg.
Returns
CommunityReportsResult: Contains both textual and structured outputs for all detected communities.
Description
Graph Preparation
Assigns a "rank" property to each node based on its degree.Community Detection
Runs the Leiden algorithm on the graph to detect communities at multiple levels.Community Report Extraction
For each community:Filters communities with fewer than 2 nodes.
Constructs dataframes for entities and relationships including descriptions.
Creates a prompt by injecting CSV representations of entities and relations.
Calls the LLM asynchronously under rate limiting and timeout constraints to generate a JSON-formatted report.
Parses and validates the generated JSON for required fields:
title,summary,findings,rating, andrating_explanation.Annotates the graph with community info.
Accumulates textual and structured reports.
Invokes the callback with progress updates.
Concurrency and Timeouts
Uses Trio async nursery and timeout decorators to parallelize and manage long-running LLM calls safely.
Usage Example
extractor = CommunityReportsExtractor(llm_invoker=my_llm)
result = await extractor(graph, callback=print)
print(result.output) # List of textual reports
print(result.structured_output) # List of dict reports
Internal Method: _get_text_output
def _get_text_output(self, parsed_output: dict) -> str:
Parameters
parsed_output(dict): A parsed JSON-like dictionary representing a single community report.
Returns
str: A formatted markdown string representing the community report.
Description
Generates a human-readable markdown report from the structured community report dictionary. The report includes:
Title as a primary header.
Summary as introductory text.
Each finding as a sub-header with explanation text.
Usage Example
text_report = extractor._get_text_output(parsed_output)
print(text_report)
Sample Output:
# Community Title
Summary text describing the community.
## Finding 1 Summary
Finding 1 explanation.
## Finding 2 Summary
Finding 2 explanation.
Important Implementation Details and Algorithms
Community Detection: Utilizes the Leiden algorithm (
leiden.run) for detecting communities within the graph. This method is known for its speed and accuracy in community detection.Data Preparation:
Entities and relationships within a community are extracted and converted to CSV format to be fed into the prompt template for the LLM.Prompt Engineering:
Employs a predefined prompt template (COMMUNITY_REPORT_PROMPT) where variable replacements inject the CSV data. This structured prompt guides the LLM to produce JSON-formatted reports.Asynchronous Processing:
The extraction process is asynchronous, allowing multiple community reports to be generated in parallel with concurrency management via Trio's nursery.Timeouts and Rate Limiting:
Incorporates timeouts (timeoutdecorator andtrio.move_on_after) and a semaphore-likechat_limiterto prevent excessive resource consumption or infinite waiting on LLM responses.Response Cleanup and Validation:
Uses regex substitutions to clean up the LLM-generated response before parsing JSON. Then verifies the presence and types of required keys.Graph Annotation:
Callsadd_community_info2graphto annotate the original graph nodes with community-related metadata (e.g., community title).
Interaction with Other Components
LLM Interface (
CompletionLLM):
The extractor depends on an LLM chat model interface to generate community reports from structured data prompts.Graph Utilities (
graphrag.general.leiden):
Uses the Leiden algorithm implementation from thegraphraglibrary for community detection.Graph Annotation (
add_community_info2graph):
Updates the input graph with community metadata after report generation.Utility Functions:
perform_variable_replacementsto generate prompts dynamically.dict_has_keys_with_typesto validate JSON structure.num_tokens_from_stringto track token usage.chat_limiterto limit concurrent LLM calls.
Asynchronous Concurrency (
trio):
Manages async execution of community report extraction tasks.
Visual Diagram
classDiagram
class CommunityReportsExtractor {
-_extraction_prompt: str
-_output_formatter_prompt: str
-_max_report_length: int
+__init__(llm_invoker: CompletionLLM, max_report_length: int | None)
+__call__(graph: nx.Graph, callback: Callable | None) async CommunityReportsResult
-_get_text_output(parsed_output: dict) str
}
class CommunityReportsResult {
+output: list~str~
+structured_output: list~dict~
}
CommunityReportsExtractor --> CommunityReportsResult : returns
CommunityReportsExtractor ..> CompletionLLM : uses
CommunityReportsExtractor ..> nx.Graph : input
CommunityReportsExtractor ..> leiden : uses community detection
CommunityReportsExtractor ..> add_community_info2graph : annotates graph
Summary
This file encapsulates the logic for detecting communities within a graph and generating both textual and structured reports describing these communities. It integrates advanced graph analysis, asynchronous programming, and LLM-based natural language generation into a coherent extractor component. The CommunityReportsExtractor class is designed for use in larger graph analysis or knowledge extraction pipelines where automated, detailed community insights are needed.