mind_map_extractor.py

Overview

mind_map_extractor.py defines a specialized extractor module used to convert textual input into structured mind map representations. It primarily interacts with a language model (LLM) to process text sections asynchronously, parse the LLM's Markdown-formatted output into hierarchical JSON structures, and merge multiple partial mind maps into a cohesive unipartite mind graph.

This module is part of the InfiniFlow system, likely within a retrieval-augmented generation (RAG) pipeline, where it leverages LLMs to extract structured knowledge (mind maps) from unstructured or semi-structured text inputs.

Classes and Functions

`MindMapResult`

@dataclass
class MindMapResult:
    output: dict

Purpose:
Simple data container class representing the mind map extraction result as a dictionary.
Attributes:
- output (dict): The extracted mind map structured as a nested dictionary representing nodes and their children.
Usage Example:

result = MindMapResult(output={"id": "root", "children": []})
print(result.output)

`MindMapExtractor`

class MindMapExtractor(Extractor):

Inherits from: Extractor (from graphrag.general.extractor)
Purpose:
Core class responsible for extracting mind maps from input text by querying an LLM, parsing the result, and converting it into a structured graph.
Attributes:

Attribute	Type	Description
`_llm`	`CompletionLLM`	The language model invoker for extraction.
`_input_text_key`	`str`	Key name under which input text is passed into the prompt. Defaults to `"input_text"`.
`_mind_map_prompt`	`str`	Prompt template for mind map extraction. Defaults to `MIND_MAP_EXTRACTION_PROMPT`.
`_on_error`	`ErrorHandlerFn`	Callback for error handling during processing.

`init`

def __init__(
        self,
        llm_invoker: CompletionLLM,
        prompt: str | None = None,
        input_text_key: str | None = None,
        on_error: ErrorHandlerFn | None = None,
)

Description:
Initializes the MindMapExtractor instance.
Parameters:
- llm_invoker (CompletionLLM): The LLM client to invoke for extraction.
- prompt (str | None): Optional custom prompt template; defaults to a predefined constant.
- input_text_key (str | None): Optional key name for input text in prompt variables; defaults to "input_text".
- on_error (ErrorHandlerFn | None): Optional error handler function.
Returns:
None
Usage Example:

extractor = MindMapExtractor(llm_invoker=my_llm, prompt=my_prompt)

`_key`

def _key(self, k: str) -> str:

Description:
Cleans a string key by removing asterisks (*), which may be markup artifacts.
Parameters:
- k (str): Input key string.
Returns:
Cleaned string without asterisks.
Example:
_key("**ExampleKey*") returns "ExampleKey"

`_be_children`

def _be_children(self, obj: dict | list | str, keyset: set) -> list:

Description:
Recursively converts nested dictionary or list structures into a list of mind map nodes, each represented as a dictionary with "id" and "children" keys.
Parameters:
- obj (dict | list | str): The input object representing child nodes.
- keyset (set): Set of keys already processed to avoid duplicates.
Returns:
List of dictionaries representing children nodes in mind map format.
Usage:
Used internally to build hierarchical mind map trees.

`call`

async def __call__(
        self, sections: list[str], prompt_variables: dict[str, Any] | None = None
) -> MindMapResult:

Description:
Asynchronously processes multiple text sections to extract and combine mind map structures.
Parameters:
- sections (list[str]): List of textual sections to extract mind maps from.
- prompt_variables (dict[str, Any] | None): Optional dictionary of variables to replace in the prompt.
Returns:
MindMapResult containing a unified mind map JSON-like dictionary.
Process Details:
- Calculates token limits based on the LLM's max token size.
- Groups sections into batches to avoid exceeding token limits.
- Launches parallel asynchronous tasks (via trio) to process each batch with _process_document.
- Merges partial mind maps using _merge.
- Constructs a root node with aggregated children.
Example Usage:

result = await mind_map_extractor(sections=["Text part 1", "Text part 2"])
print(result.output)

`_merge`

def _merge(self, d1: dict, d2: dict) -> dict:

Description:
Recursively merges two dictionaries representing mind map fragments.
Lists are concatenated, nested dictionaries merged recursively, and scalar values replaced.
Parameters:
- d1 (dict): Source dictionary to merge from.
- d2 (dict): Destination dictionary to merge into.
Returns:
The merged dictionary (d2).
Algorithm Details:
- For matching keys with dict values, merges recursively.
- For matching keys with list values, extends the destination list.
- Otherwise, overwrites destination value with source.

`_list_to_kv`

def _list_to_kv(self, data: dict) -> dict:

Description:
Converts nested lists in the extracted JSON into key-value mappings where applicable, improving structure uniformity.
Parameters:
- data (dict): Nested dictionary potentially containing lists.
Returns:
Modified dictionary with lists converted to key-value pairs where possible.
Details:
- Detects patterns where lists appear as [key, [value]] and converts to {key: value}.

`_todict`

def _todict(self, layer: collections.OrderedDict) -> dict:

Description:
Recursively converts an OrderedDict (or nested OrderedDicts) into plain dictionaries, then applies _list_to_kv for further normalization.
Parameters:
- layer (OrderedDict): Input structure from markdown_to_json parser.
Returns:
Normalized dictionary representing parsed mind map structure.

`_process_document`

async def _process_document(
        self, text: str, prompt_variables: dict[str, str], out_res: list
) -> str:

Description:
Asynchronously processes a single text document: applies the prompt, sends it to the LLM, parses the Markdown output, and appends the result to the shared output list.
Parameters:
- text (str): Text input to extract the mind map from.
- prompt_variables (dict[str, str]): Variables for prompt substitution.
- out_res (list): Mutable list to append the parsed mind map dictionary.
Returns:
The raw string response from the LLM (though mainly used for side effects).
Implementation Details:
- Performs prompt variable substitution.
- Uses a concurrency limiter (chat_limiter) to control LLM invocation rate.
- Parses the LLM Markdown response into JSON using markdown_to_json.
- Cleans code block markdown markers from the response before parsing.

Important Implementation Details

Asynchronous Processing with Trio:
The extractor uses trio to run multiple extraction tasks concurrently, improving throughput when processing multiple text sections.
Prompt Variable Replacement:
The prompt is dynamically customized with input text and other variables using perform_variable_replacements, enabling flexible prompt templates.
Markdown to JSON Parsing:
The LLM is expected to respond with a Markdown structure representing the mind map. This response is parsed into JSON using the markdown_to_json library, then normalized.
Token Length Management:
To prevent exceeding the LLM's token limit, input text is split into chunks based on estimated token counts (num_tokens_from_string).
Merging Mind Map Fragments:
Partial mind maps from each chunk are recursively merged to produce a single, unified mind map output.
Error Handling:
Custom error handling can be injected via the on_error callback.

Interaction with Other System Components

LLM Invocation:
Uses CompletionLLM (abstracted language model interface from rag.llm.chat_model) to generate mind map outputs.
Prompt Template:
Utilizes MIND_MAP_EXTRACTION_PROMPT from graphrag.general.mind_map_prompt as the default prompt for extraction.
Utilities:
Employs utility functions such as perform_variable_replacements, chat_limiter (for concurrency control), and num_tokens_from_string from rag.utils.
Markdown Parsing:
Uses markdown_to_json to convert LLM's Markdown output into structured data.
Base Class:
Inherits from Extractor, implying conformity to a shared interface or behavior expected by the system's extraction framework.

Example Usage

from rag.llm.chat_model import Base as CompletionLLM

# Assume llm is a CompletionLLM instance already configured
mind_map_extractor = MindMapExtractor(llm_invoker=llm)

sections = [
    "Introduction to neural networks...",
    "Details about convolutional layers...",
    "Summary and future directions."
]

result = await mind_map_extractor(sections)
print(result.output)

Visual Diagram: Class Diagram of `MindMapExtractor`

classDiagram
    class MindMapResult {
        +output: dict
    }

    class MindMapExtractor {
        - _llm: CompletionLLM
        - _input_text_key: str
        - _mind_map_prompt: str
        - _on_error: ErrorHandlerFn

        + __init__(llm_invoker, prompt=None, input_text_key=None, on_error=None)
        - _key(k: str) str
        - _be_children(obj: dict|list|str, keyset: set) list
        + __call__(sections: list[str], prompt_variables: dict[str, Any]|None) async MindMapResult
        - _merge(d1: dict, d2: dict) dict
        - _list_to_kv(data: dict) dict
        - _todict(layer: OrderedDict) dict
        - _process_document(text: str, prompt_variables: dict[str, str], out_res: list) async str
    }

    MindMapExtractor --> MindMapResult : returns
    MindMapExtractor ..> CompletionLLM : uses
    MindMapExtractor ..> ErrorHandlerFn : uses
    MindMapExtractor ..> MIND_MAP_EXTRACTION_PROMPT : uses
    MindMapExtractor ..> markdown_to_json : uses

Summary

mind_map_extractor.py encapsulates an asynchronous, LLM-backed extraction mechanism that converts input text into structured mind maps. It manages prompt customization, token budgeting, concurrent processing, response parsing, and merging to provide a coherent mind map representation suitable for downstream graph-based reasoning or visualization.

This module integrates tightly with the larger InfiniFlow framework, particularly the general extractor interface, prompt templates, and the RAG LLM client, ensuring modularity and extensibility within the system's knowledge extraction pipeline.