retrieval.py


Overview

The retrieval.py file defines a Retrieval tool component designed for performing relevant content searches over datasets (knowledge bases). It provides a structured and configurable way to query indexed datasets with similarity-based retrieval and optional reranking and knowledge graph (KG) based augmentation.

This file is part of the InfiniFlow project and integrates with various system components such as knowledge base services, embedding models, and language model bundles to execute semantic search queries. The Retrieval tool supports features like:


Classes

RetrievalParam

Defines the parameters for the Retrieval tool component, including metadata, configuration options, and validation logic.

Attributes

Methods

Usage Example

params = RetrievalParam()
params.similarity_threshold = 0.3
params.check()
input_form = params.get_input_form()
print(input_form)
# Output: {'query': {'name': 'Query', 'type': 'line'}}

Retrieval

The core retrieval tool implementing search logic over knowledge bases by leveraging embedding models, reranking, and knowledge graph retrieval.

This class inherits from ToolBase and ABC (abstract base class) to integrate with the InfiniFlow agent framework.

Class Attributes

Methods


_invoke(self, **kwargs)

Primary method invoked to perform retrieval based on the input query.

retrieval_tool = Retrieval()
retrieval_tool._param.kb_ids = ["my_knowledgebase"]
retrieval_tool._param.top_n = 5
result = retrieval_tool._invoke(query="Explain semantic search")
print(result)

thoughts(self) -> str

Returns an introspective string describing the retrieval intent.

print(retrieval_tool.thoughts())
# Output:
# Keywords: Explain semantic search
# Looking for the most relevant articles.

Important Implementation Details and Algorithms


Interaction with Other System Components


Mermaid Class Diagram

classDiagram
    class RetrievalParam {
        +meta: ToolMeta
        +function_name: str
        +description: str
        +similarity_threshold: float
        +keywords_similarity_weight: float
        +top_n: int
        +top_k: int
        +kb_ids: list~str~
        +kb_vars: list
        +rerank_id: str
        +empty_response: str
        +use_kg: bool
        +cross_languages: list
        +__init__()
        +check()
        +get_input_form() dict~str, dict~
    }

    class Retrieval {
        +component_name: str
        +_invoke(kwargs)
        +thoughts() str
    }

    RetrievalParam <|-- Retrieval
    Retrieval ..|> ToolBase
    Retrieval ..|> ABC

Summary

retrieval.py provides a flexible, configurable retrieval tool component for semantic search over knowledge bases within the InfiniFlow platform. It leverages embeddings, reranking, and optionally knowledge graph data to produce relevant, formalized content based on user queries. The file encapsulates parameter definitions, input validation, core retrieval logic, and integration with external services and settings. The modular design allows it to be easily extended or configured for different datasets and retrieval strategies.