dify_retrieval.py


Overview

The dify_retrieval.py file implements an API endpoint for performing knowledge retrieval queries within the InfiniFlow system. It provides a RESTful POST interface /dify/retrieval that accepts a query, knowledge base identifier, and various retrieval parameters to return relevant document chunks ranked by semantic similarity.

This retrieval endpoint supports filtering based on metadata conditions, applies similarity thresholds, and optionally integrates knowledge graph (KG) based retrieval results. It leverages embedding models and ranking utilities to return relevant content snippets from a specified knowledge base, facilitating advanced question answering and information retrieval capabilities in tenant-scoped environments.


Detailed Description

API Endpoint: /dify/retrieval


Functions and Methods

retrieval(tenant_id)

The main view function handling the /dify/retrieval POST requests.

Parameters:

Workflow:

  1. Extracts and validates the request JSON.

  2. Retrieves metadata for documents associated with the knowledge base.

  3. Fetches the knowledge base object by ID.

  4. Constructs an embedding model bundle for vector-based retrieval.

  5. Converts metadata condition filters and applies them to filter document IDs.

  6. Invokes the retrieval engine to get ranked document chunks based on semantic similarity to the query.

  7. Optionally performs knowledge graph enhanced retrieval and merges results.

  8. Constructs a response list with relevant document content, scores, titles, and metadata.

  9. Returns the results as JSON.

Returns:

Usage Example:

import requests

url = "https://api.example.com/dify/retrieval"
payload = {
    "knowledge_id": "kb123",
    "query": "What is the refund policy?",
    "use_kg": True,
    "retrieval_setting": {
        "score_threshold": 0.5,
        "top_k": 10
    },
    "metadata_condition": {
        "category": "policy"
    }
}
headers = {
    "apikey": "your_api_key_here"
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Important Implementation Details and Algorithms


Interaction with Other System Components

This file acts as the controller layer connecting HTTP requests to backend retrieval services and orchestrates multiple services to produce a unified retrieval result.


Class & Function Diagram

flowchart TD
    A[POST /dify/retrieval Endpoint]
    A -->|Requires API Key| B[apikey_required Decorator]
    A -->|Validates JSON Fields| C[validate_request Decorator]

    A --> D[retrieval(tenant_id) Function]
    D --> E[DocumentService.get_meta_by_kbs(kb_id)]
    D --> F[KnowledgebaseService.get_by_id(kb_id)]
    D --> G[LLMBundle Embedding Model]
    D --> H[meta_filter + convert_conditions(metadata_condition)]
    D --> I[settings.retrievaler.retrieval(query, ...)]
    D --> J[Optional: settings.kg_retrievaler.retrieval(...) if use_kg]
    D --> K[DocumentService.get_by_id(doc_id)]
    D --> L[build_error_result on errors]

    subgraph Retrieval Flow
        E --> H --> I --> J --> K
    end

Summary

The dify_retrieval.py file is a critical component in the InfiniFlow application that handles knowledge retrieval requests through a well-defined API endpoint. It integrates metadata filtering, vector similarity ranking, and optional knowledge graph augmentation to deliver relevant document chunks from tenant-specific knowledge bases. The file leverages multiple backend services and utility functions to validate inputs, process queries, and return structured JSON responses optimized for downstream consumption by client applications or other system modules.