search.py


Overview

search.py implements a knowledge graph search utility within the InfiniFlow system, providing semantic retrieval capabilities over entities and relations stored in a knowledge base (KB). It extends a generic search Dealer class to support:

This file primarily facilitates answering user questions by semantically searching knowledge graph content and returning structured, scored entity and relation data, enriched with descriptions and community reports.


Classes and Functions

Class: KGSearch

KGSearch extends the Dealer class from rag.nlp.search to implement knowledge graph-specific search and retrieval functionalities.

Methods


_chat(self, llm_bdl, system, history, gen_conf) -> str

Interact with a language model chat interface with caching support.


query_rewrite(self, llm, question: str, idxnms: list, kb_ids: list) -> (list, list)

Rewrite a user question into keywords and entity types using an LLM prompt.


_ent_info_from_(self, es_res: dict, sim_thr: float=0.3) -> dict

Extract entity information from Elasticsearch (or similar) search results.


_relation_info_from_(self, es_res: dict, sim_thr: float=0.3) -> dict

Extract relation information from search results.


get_relevant_ents_by_keywords(self, keywords: list, filters: dict, idxnms: list, kb_ids: list, emb_mdl, sim_thr=0.3, N=56) -> dict

Retrieve relevant entities matching input keywords.


get_relevant_relations_by_txt(self, txt: str, filters: dict, idxnms: list, kb_ids: list, emb_mdl, sim_thr=0.3, N=56) -> dict

Retrieve relevant relations based on input text.


get_relevant_ents_by_types(self, types: list, filters: dict, idxnms: list, kb_ids: list, N=56) -> dict

Retrieve relevant entities filtered by entity types.


retrieval(self, question: str, tenant_ids: str | list[str], kb_ids: list[str], emb_mdl, llm, max_token: int=8196, ent_topn: int=6, rel_topn: int=6, comm_topn: int=1, ent_sim_threshold: float=0.3, rel_sim_threshold: float=0.3, **kwargs) -> dict

Main method performing full knowledge graph retrieval workflow for a question.


_community_retrieval_(self, entities: list, condition: dict, kb_ids: list, idxnms: list, topn: int, max_token: int) -> str

Retrieve community reports related to entities.


Implementation Details


Interactions with Other Components

The file acts as a core semantic search module connecting natural language queries to knowledge graph data and serving enriched contextual results.


Command Line Interface

When run as a script, the file parses CLI arguments for tenant ID, knowledge base ID, and question, then performs a retrieval using KGSearch and prints the result. This enables quick testing or integration in pipelines.


Visual Diagram

classDiagram
    class KGSearch {
        +_chat(llm_bdl, system, history, gen_conf) str
        +query_rewrite(llm, question, idxnms, kb_ids) (list, list)
        +_ent_info_from_(es_res, sim_thr=0.3) dict
        +_relation_info_from_(es_res, sim_thr=0.3) dict
        +get_relevant_ents_by_keywords(keywords, filters, idxnms, kb_ids, emb_mdl, sim_thr=0.3, N=56) dict
        +get_relevant_relations_by_txt(txt, filters, idxnms, kb_ids, emb_mdl, sim_thr=0.3, N=56) dict
        +get_relevant_ents_by_types(types, filters, idxnms, kb_ids, N=56) dict
        +retrieval(question, tenant_ids, kb_ids, emb_mdl, llm, max_token=8196, ent_topn=6, rel_topn=6, comm_topn=1, ent_sim_threshold=0.3, rel_sim_threshold=0.3, **kwargs) dict
        +_community_retrieval_(entities, condition, kb_ids, idxnms, topn, max_token) str
    }
    KGSearch --|> Dealer

Summary

The search.py file provides a sophisticated knowledge graph search engine that leverages LLMs for query understanding, embedding models for semantic retrieval, and multi-hop neighborhood information to return relevant entities, relations, and community insights. It forms a crucial backend component for knowledge-driven question answering and information discovery within the InfiniFlow system.