community_reports_extractor.py


Overview

The community_reports_extractor.py file defines a specialized extractor for generating descriptive and structured reports about communities detected within a graph. Leveraging the Leiden algorithm for community detection, it processes a NetworkX graph, extracts entity and relationship data for each community, and utilizes a large language model (LLM) to generate narrative reports summarizing the community characteristics, findings, and ratings.

This file is an integral part of a graph analysis system that applies advanced community detection and natural language generation to produce human-readable and machine-structured community insights. It depends on external modules such as graphrag for graph operations and community detection, and a chat-based LLM interface for report generation.


Classes and Functions

CommunityReportsResult

A simple data class representing the output of the community reports extraction process.

Attributes

Attribute

Type

Description

output

list[str]

List of textual community reports.

structured_output

list[dict]

List of structured JSON-like community data.

Usage

result = CommunityReportsResult(
    output=["# Community 1 Report\n...", "# Community 2 Report\n..."],
    structured_output=[{...}, {...}]
)

CommunityReportsExtractor

An extractor class derived from Extractor that generates community reports by analyzing a graph's community structure and invoking a large language model for text generation.

Initialization

def __init__(
    self,
    llm_invoker: CompletionLLM,
    max_report_length: int | None = None,
)
Parameters
Description

Initializes the extractor with the provided LLM invoker and sets the extraction prompt and maximum report length.


Calling the Extractor Instance

async def __call__(self, graph: nx.Graph, callback: Callable | None = None) -> CommunityReportsResult:
Parameters
Returns
Description
  1. Graph Preparation
    Assigns a "rank" property to each node based on its degree.

  2. Community Detection
    Runs the Leiden algorithm on the graph to detect communities at multiple levels.

  3. Community Report Extraction
    For each community:

    • Filters communities with fewer than 2 nodes.

    • Constructs dataframes for entities and relationships including descriptions.

    • Creates a prompt by injecting CSV representations of entities and relations.

    • Calls the LLM asynchronously under rate limiting and timeout constraints to generate a JSON-formatted report.

    • Parses and validates the generated JSON for required fields: title, summary, findings, rating, and rating_explanation.

    • Annotates the graph with community info.

    • Accumulates textual and structured reports.

    • Invokes the callback with progress updates.

  4. Concurrency and Timeouts
    Uses Trio async nursery and timeout decorators to parallelize and manage long-running LLM calls safely.

Usage Example
extractor = CommunityReportsExtractor(llm_invoker=my_llm)
result = await extractor(graph, callback=print)
print(result.output)  # List of textual reports
print(result.structured_output)  # List of dict reports

Internal Method: _get_text_output

def _get_text_output(self, parsed_output: dict) -> str:
Parameters
Returns
Description

Generates a human-readable markdown report from the structured community report dictionary. The report includes:

Usage Example
text_report = extractor._get_text_output(parsed_output)
print(text_report)

Sample Output:

# Community Title

Summary text describing the community.

## Finding 1 Summary

Finding 1 explanation.

## Finding 2 Summary

Finding 2 explanation.

Important Implementation Details and Algorithms


Interaction with Other Components


Visual Diagram

classDiagram
    class CommunityReportsExtractor {
        -_extraction_prompt: str
        -_output_formatter_prompt: str
        -_max_report_length: int
        +__init__(llm_invoker: CompletionLLM, max_report_length: int | None)
        +__call__(graph: nx.Graph, callback: Callable | None) async CommunityReportsResult
        -_get_text_output(parsed_output: dict) str
    }

    class CommunityReportsResult {
        +output: list~str~
        +structured_output: list~dict~
    }

    CommunityReportsExtractor --> CommunityReportsResult : returns
    CommunityReportsExtractor ..> CompletionLLM : uses
    CommunityReportsExtractor ..> nx.Graph : input
    CommunityReportsExtractor ..> leiden : uses community detection
    CommunityReportsExtractor ..> add_community_info2graph : annotates graph

Summary

This file encapsulates the logic for detecting communities within a graph and generating both textual and structured reports describing these communities. It integrates advanced graph analysis, asynchronous programming, and LLM-based natural language generation into a coherent extractor component. The CommunityReportsExtractor class is designed for use in larger graph analysis or knowledge extraction pipelines where automated, detailed community insights are needed.