leiden.py


Overview

The leiden.py module provides functionality to perform community detection on graphs using the Leiden algorithm, a popular method for detecting communities in networks. It focuses on producing stable, reproducible community partitions by normalizing node names, stabilizing graph orderings, and optionally restricting analysis to the largest connected component (LCC). The module integrates with the graspologic library’s Leiden implementation and NetworkX graph structures to compute hierarchical partitions of graphs into communities.

Key functionalities include:

This module is intended for use in graph analysis pipelines where reproducible, hierarchical community detection is required, such as social network analysis, bioinformatics, and other domains involving complex networks.


Classes and Functions

1. _stabilize_graph(graph: nx.Graph) -> nx.Graph

Purpose:
Create a stable and reproducible ordering of nodes and edges in a graph, ensuring that undirected graphs always have edges represented in a canonical source-target order. This avoids inconsistencies when consuming graph data downstream.

Parameters:

Returns:

Usage Example:

stable_graph = _stabilize_graph(my_graph)

Implementation Details:


2. normalize_node_names(graph: nx.Graph | nx.DiGraph) -> nx.Graph | nx.DiGraph

Purpose:
Normalize node names in the graph by converting them to uppercase, stripping whitespace, and unescaping HTML entities. This ensures consistent naming conventions for nodes.

Parameters:

Returns:

Usage Example:

normalized_graph = normalize_node_names(my_graph)

Implementation Details:


3. stable_largest_connected_component(graph: nx.Graph) -> nx.Graph

Purpose:
Extract the largest connected component (LCC) of an undirected graph, then normalize and stabilize it to ensure reproducible ordering.

Parameters:

Returns:

Usage Example:

lcc_graph = stable_largest_connected_component(my_graph)

Implementation Details:


4. `_compute_leiden_communities(

    graph: nx.Graph | nx.DiGraph,
    max_cluster_size: int,
    use_lcc: bool,
    seed=0xDEADBEEF,
) -> dict[int, dict[str, int]]`

Purpose:
Compute hierarchical Leiden community partitions on the given graph, optionally restricting to the LCC and using a random seed for reproducibility.

Parameters:

Returns:

Usage Example:

communities = _compute_leiden_communities(graph, max_cluster_size=10, use_lcc=True, seed=42)

Implementation Details:


5. run(graph: nx.Graph, args: dict[str, Any]) -> dict[int, dict[str, dict]]

Purpose:
Primary entry point to run Leiden community detection with configurable parameters and return a structured mapping of detected communities, including normalized weights and node memberships per community and level.

Parameters:

Returns:

Usage Example:

args = {"max_cluster_size": 15, "use_lcc": True, "verbose": True}
results = run(my_graph, args)

Implementation Details:


6. add_community_info2graph(graph: nx.Graph, nodes: list[str], community_title)

Purpose:
Annotate nodes in the input graph with community membership information by appending community titles to a node attribute.

Parameters:

Returns:

Usage Example:

add_community_info2graph(my_graph, ["node1", "node5"], "CommunityA")

Implementation Details:


Important Implementation Details and Algorithms


Interaction with Other Parts of the System


Visual Diagram: Class & Function Structure of leiden.py

flowchart TD
    A[(Graph Input)] --> B[_stabilize_graph]
    A --> C[normalize_node_names]
    B --> D[stable_largest_connected_component]
    C --> D
    D --> E[_compute_leiden_communities]
    A --> E
    E --> F[run]
    F --> G[add_community_info2graph]

    subgraph Graph Normalization & Stabilization
        B
        C
        D
    end

    subgraph Leiden Community Detection
        E
        F
    end

    subgraph Graph Annotation
        G
    end

Summary

The leiden.py module is a utility for stable and hierarchical community detection in graphs using the Leiden algorithm. It ensures reproducibility through graph stabilization and node normalization, supports configurable parameters for clustering, and outputs detailed, weighted community structures. It integrates closely with the graspologic library and NetworkX, making it suitable for advanced network analysis workflows requiring robust community detection and annotation.


If you require usage examples integrating this module into a larger pipeline or further explanation of the Leiden algorithm itself, please let me know!