mind_map_extractor.py
Overview
mind_map_extractor.py defines a specialized extractor module used to convert textual input into structured mind map representations. It primarily interacts with a language model (LLM) to process text sections asynchronously, parse the LLM's Markdown-formatted output into hierarchical JSON structures, and merge multiple partial mind maps into a cohesive unipartite mind graph.
This module is part of the InfiniFlow system, likely within a retrieval-augmented generation (RAG) pipeline, where it leverages LLMs to extract structured knowledge (mind maps) from unstructured or semi-structured text inputs.
Classes and Functions
MindMapResult
@dataclass
class MindMapResult:
output: dict
Purpose:
Simple data container class representing the mind map extraction result as a dictionary.Attributes:
output(dict): The extracted mind map structured as a nested dictionary representing nodes and their children.
Usage Example:
result = MindMapResult(output={"id": "root", "children": []})
print(result.output)
MindMapExtractor
class MindMapExtractor(Extractor):
Inherits from:
Extractor(fromgraphrag.general.extractor)Purpose:
Core class responsible for extracting mind maps from input text by querying an LLM, parsing the result, and converting it into a structured graph.Attributes:
Attribute | Type | Description |
|---|---|---|
|
| The language model invoker for extraction. |
|
| Key name under which input text is passed into the prompt. Defaults to |
|
| Prompt template for mind map extraction. Defaults to |
|
| Callback for error handling during processing. |
__init__
def __init__(
self,
llm_invoker: CompletionLLM,
prompt: str | None = None,
input_text_key: str | None = None,
on_error: ErrorHandlerFn | None = None,
)
Description:
Initializes the MindMapExtractor instance.Parameters:
llm_invoker(CompletionLLM): The LLM client to invoke for extraction.prompt(str | None): Optional custom prompt template; defaults to a predefined constant.input_text_key(str | None): Optional key name for input text in prompt variables; defaults to"input_text".on_error(ErrorHandlerFn | None): Optional error handler function.
Returns:
NoneUsage Example:
extractor = MindMapExtractor(llm_invoker=my_llm, prompt=my_prompt)
_key
def _key(self, k: str) -> str:
Description:
Cleans a string key by removing asterisks (*), which may be markup artifacts.Parameters:
k(str): Input key string.
Returns:
Cleaned string without asterisks.Example:
_key("**ExampleKey*")returns"ExampleKey"
_be_children
def _be_children(self, obj: dict | list | str, keyset: set) -> list:
Description:
Recursively converts nested dictionary or list structures into a list of mind map nodes, each represented as a dictionary with"id"and"children"keys.Parameters:
obj(dict | list | str): The input object representing child nodes.keyset(set): Set of keys already processed to avoid duplicates.
Returns:
List of dictionaries representing children nodes in mind map format.Usage:
Used internally to build hierarchical mind map trees.
__call__
async def __call__(
self, sections: list[str], prompt_variables: dict[str, Any] | None = None
) -> MindMapResult:
Description:
Asynchronously processes multiple text sections to extract and combine mind map structures.Parameters:
sections(list[str]): List of textual sections to extract mind maps from.prompt_variables(dict[str, Any] | None): Optional dictionary of variables to replace in the prompt.
Returns:
MindMapResultcontaining a unified mind map JSON-like dictionary.Process Details:
Calculates token limits based on the LLM's max token size.
Groups sections into batches to avoid exceeding token limits.
Launches parallel asynchronous tasks (via
trio) to process each batch with_process_document.Merges partial mind maps using
_merge.Constructs a root node with aggregated children.
Example Usage:
result = await mind_map_extractor(sections=["Text part 1", "Text part 2"])
print(result.output)
_merge
def _merge(self, d1: dict, d2: dict) -> dict:
Description:
Recursively merges two dictionaries representing mind map fragments.
Lists are concatenated, nested dictionaries merged recursively, and scalar values replaced.Parameters:
d1(dict): Source dictionary to merge from.d2(dict): Destination dictionary to merge into.
Returns:
The merged dictionary (d2).Algorithm Details:
For matching keys with dict values, merges recursively.
For matching keys with list values, extends the destination list.
Otherwise, overwrites destination value with source.
_list_to_kv
def _list_to_kv(self, data: dict) -> dict:
Description:
Converts nested lists in the extracted JSON into key-value mappings where applicable, improving structure uniformity.Parameters:
data(dict): Nested dictionary potentially containing lists.
Returns:
Modified dictionary with lists converted to key-value pairs where possible.Details:
Detects patterns where lists appear as
[key, [value]]and converts to{key: value}.
_todict
def _todict(self, layer: collections.OrderedDict) -> dict:
Description:
Recursively converts anOrderedDict(or nested OrderedDicts) into plain dictionaries, then applies_list_to_kvfor further normalization.Parameters:
layer(OrderedDict): Input structure from markdown_to_json parser.
Returns:
Normalized dictionary representing parsed mind map structure.
_process_document
async def _process_document(
self, text: str, prompt_variables: dict[str, str], out_res: list
) -> str:
Description:
Asynchronously processes a single text document: applies the prompt, sends it to the LLM, parses the Markdown output, and appends the result to the shared output list.Parameters:
text(str): Text input to extract the mind map from.prompt_variables(dict[str, str]): Variables for prompt substitution.out_res(list): Mutable list to append the parsed mind map dictionary.
Returns:
The raw string response from the LLM (though mainly used for side effects).Implementation Details:
Performs prompt variable substitution.
Uses a concurrency limiter (
chat_limiter) to control LLM invocation rate.Parses the LLM Markdown response into JSON using
markdown_to_json.Cleans code block markdown markers from the response before parsing.
Important Implementation Details
Asynchronous Processing with Trio:
The extractor usestrioto run multiple extraction tasks concurrently, improving throughput when processing multiple text sections.Prompt Variable Replacement:
The prompt is dynamically customized with input text and other variables usingperform_variable_replacements, enabling flexible prompt templates.Markdown to JSON Parsing:
The LLM is expected to respond with a Markdown structure representing the mind map. This response is parsed into JSON using themarkdown_to_jsonlibrary, then normalized.Token Length Management:
To prevent exceeding the LLM's token limit, input text is split into chunks based on estimated token counts (num_tokens_from_string).Merging Mind Map Fragments:
Partial mind maps from each chunk are recursively merged to produce a single, unified mind map output.Error Handling:
Custom error handling can be injected via theon_errorcallback.
Interaction with Other System Components
LLM Invocation:
UsesCompletionLLM(abstracted language model interface fromrag.llm.chat_model) to generate mind map outputs.Prompt Template:
UtilizesMIND_MAP_EXTRACTION_PROMPTfromgraphrag.general.mind_map_promptas the default prompt for extraction.Utilities:
Employs utility functions such asperform_variable_replacements,chat_limiter(for concurrency control), andnum_tokens_from_stringfromrag.utils.Markdown Parsing:
Usesmarkdown_to_jsonto convert LLM's Markdown output into structured data.Base Class:
Inherits fromExtractor, implying conformity to a shared interface or behavior expected by the system's extraction framework.
Example Usage
from rag.llm.chat_model import Base as CompletionLLM
# Assume llm is a CompletionLLM instance already configured
mind_map_extractor = MindMapExtractor(llm_invoker=llm)
sections = [
"Introduction to neural networks...",
"Details about convolutional layers...",
"Summary and future directions."
]
result = await mind_map_extractor(sections)
print(result.output)
Visual Diagram: Class Diagram of MindMapExtractor
classDiagram
class MindMapResult {
+output: dict
}
class MindMapExtractor {
- _llm: CompletionLLM
- _input_text_key: str
- _mind_map_prompt: str
- _on_error: ErrorHandlerFn
+ __init__(llm_invoker, prompt=None, input_text_key=None, on_error=None)
- _key(k: str) str
- _be_children(obj: dict|list|str, keyset: set) list
+ __call__(sections: list[str], prompt_variables: dict[str, Any]|None) async MindMapResult
- _merge(d1: dict, d2: dict) dict
- _list_to_kv(data: dict) dict
- _todict(layer: OrderedDict) dict
- _process_document(text: str, prompt_variables: dict[str, str], out_res: list) async str
}
MindMapExtractor --> MindMapResult : returns
MindMapExtractor ..> CompletionLLM : uses
MindMapExtractor ..> ErrorHandlerFn : uses
MindMapExtractor ..> MIND_MAP_EXTRACTION_PROMPT : uses
MindMapExtractor ..> markdown_to_json : uses
Summary
mind_map_extractor.py encapsulates an asynchronous, LLM-backed extraction mechanism that converts input text into structured mind maps. It manages prompt customization, token budgeting, concurrent processing, response parsing, and merging to provide a coherent mind map representation suitable for downstream graph-based reasoning or visualization.
This module integrates tightly with the larger InfiniFlow framework, particularly the general extractor interface, prompt templates, and the RAG LLM client, ensuring modularity and extensibility within the system's knowledge extraction pipeline.