leiden.py
Overview
The leiden.py module provides functionality to perform community detection on graphs using the Leiden algorithm, a popular method for detecting communities in networks. It focuses on producing stable, reproducible community partitions by normalizing node names, stabilizing graph orderings, and optionally restricting analysis to the largest connected component (LCC). The module integrates with the graspologic library’s Leiden implementation and NetworkX graph structures to compute hierarchical partitions of graphs into communities.
Key functionalities include:
Stabilizing graph representations to ensure consistent ordering of nodes and edges.
Normalizing node names for uniformity.
Extracting the largest connected component stably.
Computing hierarchical Leiden community partitions with configurable parameters.
Post-processing results to include node weights and normalized community weights.
Annotating graphs with community membership information.
This module is intended for use in graph analysis pipelines where reproducible, hierarchical community detection is required, such as social network analysis, bioinformatics, and other domains involving complex networks.
Classes and Functions
1. _stabilize_graph(graph: nx.Graph) -> nx.Graph
Purpose:
Create a stable and reproducible ordering of nodes and edges in a graph, ensuring that undirected graphs always have edges represented in a canonical source-target order. This avoids inconsistencies when consuming graph data downstream.
Parameters:
graph(nx.Graph): Input NetworkX graph (directed or undirected).
Returns:
nx.Graph: A new graph with nodes and edges sorted in a stable order.
Usage Example:
stable_graph = _stabilize_graph(my_graph)
Implementation Details:
Nodes are sorted lexicographically by their identifiers.
For undirected graphs, edges are ordered so that
(min(node1, node2), max(node1, node2))is always the edge representation.Edges are then sorted lexicographically by source-target key strings.
Returns a new graph of the same type (directed or undirected).
2. normalize_node_names(graph: nx.Graph | nx.DiGraph) -> nx.Graph | nx.DiGraph
Purpose:
Normalize node names in the graph by converting them to uppercase, stripping whitespace, and unescaping HTML entities. This ensures consistent naming conventions for nodes.
Parameters:
graph(nx.Graphornx.DiGraph): The input graph whose node names will be normalized.
Returns:
nx.Graphornx.DiGraph: A graph with normalized node names.
Usage Example:
normalized_graph = normalize_node_names(my_graph)
Implementation Details:
Uses a mapping to relabel nodes.
Applies
str.upper(),str.strip(), andhtml.unescape()on each node name.
3. stable_largest_connected_component(graph: nx.Graph) -> nx.Graph
Purpose:
Extract the largest connected component (LCC) of an undirected graph, then normalize and stabilize it to ensure reproducible ordering.
Parameters:
graph(nx.Graph): Input undirected graph.
Returns:
nx.Graph: The largest connected component graph with normalized nodes and stable ordering.
Usage Example:
lcc_graph = stable_largest_connected_component(my_graph)
Implementation Details:
Uses
largest_connected_componentfromgraspologic.utils.Normalizes node names.
Stabilizes graph ordering with
_stabilize_graph.
4. `_compute_leiden_communities(
graph: nx.Graph | nx.DiGraph,
max_cluster_size: int,
use_lcc: bool,
seed=0xDEADBEEF,
) -> dict[int, dict[str, int]]`
Purpose:
Compute hierarchical Leiden community partitions on the given graph, optionally restricting to the LCC and using a random seed for reproducibility.
Parameters:
graph(nx.Graphornx.DiGraph): Input graph.max_cluster_size(int): Maximum allowed cluster size during the Leiden partitioning.use_lcc(bool): Whether to restrict community detection to the largest connected component.seed(int, optional): Random seed to ensure deterministic results. Default0xDEADBEEF.
Returns:
dict[int, dict[str, int]]: Dictionary mapping hierarchy levels to node-community assignments.
Format:{ level: { node_id: cluster_id, ... }, ... }
Usage Example:
communities = _compute_leiden_communities(graph, max_cluster_size=10, use_lcc=True, seed=42)
Implementation Details:
Returns empty results if the graph is empty.
If
use_lcc, extracts stable LCC before clustering.Calls
hierarchical_leidenfromgraspologic.partition.Collects community cluster assignments grouped by hierarchy level.
5. run(graph: nx.Graph, args: dict[str, Any]) -> dict[int, dict[str, dict]]
Purpose:
Primary entry point to run Leiden community detection with configurable parameters and return a structured mapping of detected communities, including normalized weights and node memberships per community and level.
Parameters:
graph(nx.Graph): Input graph to analyze.args(dict[str, Any]): Dictionary of parameters including:"max_cluster_size"(int, optional): Maximum cluster size (default 12)."use_lcc"(bool, optional): Whether to restrict to LCC (default True)."seed"(int, optional): Random seed (default 0xDEADBEEF)."verbose"(bool, optional): Enable debug logging."levels"(list[int], optional): Specific hierarchy levels to include; defaults to all detected levels.
Returns:
dict[int, dict[str, dict]]: Hierarchical community structure where keys are levels and values are dictionaries mapping community IDs to:"nodes": List of node IDs in community."weight": Normalized community weight (based on node rank and weight attributes).
Usage Example:
args = {"max_cluster_size": 15, "use_lcc": True, "verbose": True}
results = run(my_graph, args)
Implementation Details:
Validates non-empty graph.
Calls
_compute_leiden_communitiesinternally.For each hierarchy level:
Aggregates nodes into communities.
Calculates community weights by summing node
rank * weight.Normalizes weights relative to max community weight.
Logs warnings for nodes missing in the graph.
6. add_community_info2graph(graph: nx.Graph, nodes: list[str], community_title)
Purpose:
Annotate nodes in the input graph with community membership information by appending community titles to a node attribute.
Parameters:
graph(nx.Graph): Graph whose nodes are annotated.nodes(list[str]): List of node IDs to annotate.community_title(Any): Community label/title to append.
Returns:
None(modifies graph in place).
Usage Example:
add_community_info2graph(my_graph, ["node1", "node5"], "CommunityA")
Implementation Details:
Checks if
"communities"attribute exists for each node; if not, initializes as empty list.Appends new community title and removes duplicates.
Important Implementation Details and Algorithms
Leiden Algorithm Integration: The module leverages the
hierarchical_leidenfunction from thegraspologic.partitionpackage, which implements the Leiden community detection algorithm. This algorithm improves upon the Louvain method by guaranteeing well-connected communities and better partition quality.Graph Stabilization: Sorting nodes and edges ensures deterministic outputs and repeatability of results, which is critical for downstream consumers that rely on consistent graph traversals or node orderings.
Normalization of Node Names: Node names are normalized by uppercasing and unescaping HTML entities to avoid mismatches due to casing or encoding variations.
Hierarchical Community Detection: The Leiden algorithm here operates hierarchically, producing multiple levels of community partitions. Each level corresponds to a resolution or granularity of clustering.
Community Weight Calculation: Communities are attributed a "weight" computed as a sum of node-level weights (
rank * weight), then normalized relative to the max community weight at that level, supporting weighted analyses.
Interaction with Other Parts of the System
graspologic.partition.hierarchical_leiden: Core Leiden clustering is delegated to this function, which must be available in the environment. The module depends on it for community detection computations.
graspologic.utils.largest_connected_component: Used to extract the LCC for stable community detection when requested.
NetworkX (
nx): All graph manipulations use NetworkX graph objects, making this module compatible with any NetworkX-based graph pipeline.Logging: The module uses Python's standard
loggingfor debug and warning messages.Downstream Usage: The data structures returned by
run()(nested dictionaries keyed by hierarchy level and community IDs) are designed for easy consumption by visualization tools, further analytics, or integration into graph annotation pipelines.
Visual Diagram: Class & Function Structure of leiden.py
flowchart TD
A[(Graph Input)] --> B[_stabilize_graph]
A --> C[normalize_node_names]
B --> D[stable_largest_connected_component]
C --> D
D --> E[_compute_leiden_communities]
A --> E
E --> F[run]
F --> G[add_community_info2graph]
subgraph Graph Normalization & Stabilization
B
C
D
end
subgraph Leiden Community Detection
E
F
end
subgraph Graph Annotation
G
end
Summary
The leiden.py module is a utility for stable and hierarchical community detection in graphs using the Leiden algorithm. It ensures reproducibility through graph stabilization and node normalization, supports configurable parameters for clustering, and outputs detailed, weighted community structures. It integrates closely with the graspologic library and NetworkX, making it suitable for advanced network analysis workflows requiring robust community detection and annotation.
If you require usage examples integrating this module into a larger pipeline or further explanation of the Leiden algorithm itself, please let me know!