autokeyword_autoquestion.mdx

Overview

The autokeyword_autoquestion.mdx file serves as a documentation page within a knowledge base or documentation site, describing two key features of the RAGFlow system: Auto-keyword and Auto-question. These features leverage a chat model to automatically generate keywords or questions from text chunks during document indexing to improve retrieval accuracy and relevance in search or FAQ scenarios.

Specifically, this page explains:

The purpose and benefit of enabling auto-keyword and auto-question generation.
How these features work in conjunction with chunking methods.
Recommended settings and usage guidelines based on chunk sizes.
Community tips and typical use cases for adjusting these feature parameters.
Warnings about increased resource usage when enabling these features.

This documentation is intended for users configuring knowledge base ingestion and retrieval settings, helping them understand how to fine-tune auto-generation parameters for optimal search results.

Detailed Explanation of Concepts and Features

Auto-keyword

Definition:
Auto-keyword is a feature in RAGFlow that automatically generates keywords or synonyms from each chunk of text in the knowledge base using a chat model. These generated keywords serve as an additional layer of metadata to help correct errors and improve the accuracy of document retrieval.

Functionality:

When enabled, the system sends each chunk of text to the chat model.
The model returns a specified number of keywords per chunk.
These keywords augment the chunk’s metadata, making search matching more robust.

Configuration:

Controlled via a slider on the Configuration page under Page rank.
Values range from 0 (disabled) to a maximum of 30.
Recommended values depend on chunk size; for chunks ~1,000 characters, 3–5 keywords are suggested.
Non-integer values input are rounded down to the nearest integer.

Important Notes:

Increasing the value increases indexing time and token usage.
Larger chunk sizes can justify higher values.
Marginal benefit diminishes as the value increases.

Usage Example:
If your knowledge base chunks average 1,000 characters and you want to enhance retrieval, set Auto-keyword to 4. This setting will generate 4 keywords per chunk during indexing.

Auto-question

Definition:
Auto-question leverages a chat model to generate natural language questions (e.g., who, what, why) from each chunk. These questions improve matching in FAQ-like scenarios by anticipating user inquiries.

Functionality:

For each chunk, the chat model generates a specified number of relevant questions.
These questions help the retrieval system better understand and respond to user queries, especially in domains like product manuals or policy documents.

Configuration:

Available as a slider under Page rank on the Configuration page.
Values range from 0 (disabled) to a maximum of 10.
Recommended values (1 or 2) for chunks ~1,000 characters.
Non-integer inputs are rounded down.

Important Notes:

Like Auto-keyword, increasing values raise indexing time and token use.
The benefit decreases with higher values.
Ideal for FAQ or troubleshooting document retrieval.

Usage Example:
For a technical support knowledge base with ~1,000 character chunks, setting Auto-question to 2 will generate two questions per chunk during indexing, improving user query matches.

Implementation Details

Both features rely on a chat model API to process each chunk of text.
The chunking method splits documents into manageable pieces (~1,000 characters recommended).
For each chunk, the system calls the chat model to generate either keywords or questions, depending on enabled features and slider values.
Generated keywords/questions are stored as metadata alongside the chunk for enhanced retrieval.
The system warns users that enabling these features will increase the time and cost of document indexing due to API calls and token usage.

Interaction With Other System Components

Chunking Module: This feature is tightly coupled with the chunking method chosen in the knowledge base ingestion pipeline. Chunk size impacts recommended Auto-keyword and Auto-question settings.
Configuration Interface: The sliders for these features are exposed on the Configuration page under Page rank, allowing users to adjust settings dynamically.
Chat Model API: Both features depend on external or integrated chat model APIs that generate keywords/questions on demand.
Retrieval Engine: The enriched metadata (keywords/questions) generated here improve the downstream retrieval engine’s ability to accurately match user queries to relevant chunks.
Knowledge Base Indexing: Auto-keyword and Auto-question augment the indexing process, adding an "additional layer" of information to chunks.

Community Tips and Use Cases

The file includes a community-sourced table relating typical document types and volumes with recommended Auto-keyword and Auto-question slider values. This helps users new to the feature choose sensible starting points based on their document corpus.

Use Case / Scenario	Document Size/Length	Auto-keyword (0–30)	Auto-question (0–10)
Employee handbook (internal process guidance)	Small (<10 pages)	0	0
Customer service FAQs	Medium (10–100 pages)	3–7	1–3
Technical whitepapers	Large (>100 pages)	2–4	1–2
Contracts / Legal retrieval	Large (>50 pages)	2–5	0–1
Multi-repository layered documents	Many	Adjust as appropriate	Adjust as appropriate
Social media comment pool (multilingual, short texts)	Very large	8–12	0
Operational logs for troubleshooting	Very large	3–6	0
Marketing asset library (multilingual)	Medium	6–10	1–2
Training courses / eBooks	Large	2–5	1–2
Maintenance manual (equipment diagrams + steps)	Medium	3–7	1–2

Visual Diagram

Below is a flowchart illustrating the main functions and their relationships in the auto-keyword and auto-question generation workflow during knowledge base document ingestion:

flowchart TD
    A[Document ingestion] --> B[Chunking Method]
    B --> C{Auto-keyword enabled?}
    C -- Yes --> D[Send chunk to Chat Model for Keywords]
    C -- No --> E[Skip Keyword Generation]
    B --> F{Auto-question enabled?}
    F -- Yes --> G[Send chunk to Chat Model for Questions]
    F -- No --> H[Skip Question Generation]
    D --> I[Store Keywords with Chunk Metadata]
    G --> J[Store Questions with Chunk Metadata]
    I --> K[Index Chunk with Enriched Metadata]
    J --> K
    E --> K
    H --> K
    K --> L[Enhanced Retrieval Engine]

Summary

This file documents the Auto-keyword and Auto-question features that enhance retrieval by generating auxiliary keywords and questions from text chunks.
Both features use a chat model for generation, increasing indexing time and token usage.
Proper configuration depends on chunk size and document type.
The enriched metadata improves the knowledge base’s ability to match user queries.
The file includes community-recommended settings and important usage warnings.

This documentation assists users, administrators, and developers in understanding and configuring the auto-generation capabilities within RAGFlow’s knowledge base indexing process.