autokeyword_autoquestion.mdx


Overview

The autokeyword_autoquestion.mdx file serves as a documentation page within a knowledge base or documentation site, describing two key features of the RAGFlow system: Auto-keyword and Auto-question. These features leverage a chat model to automatically generate keywords or questions from text chunks during document indexing to improve retrieval accuracy and relevance in search or FAQ scenarios.

Specifically, this page explains:

This documentation is intended for users configuring knowledge base ingestion and retrieval settings, helping them understand how to fine-tune auto-generation parameters for optimal search results.


Detailed Explanation of Concepts and Features

Auto-keyword

Definition:
Auto-keyword is a feature in RAGFlow that automatically generates keywords or synonyms from each chunk of text in the knowledge base using a chat model. These generated keywords serve as an additional layer of metadata to help correct errors and improve the accuracy of document retrieval.

Functionality:

Configuration:

Important Notes:

Usage Example:
If your knowledge base chunks average 1,000 characters and you want to enhance retrieval, set Auto-keyword to 4. This setting will generate 4 keywords per chunk during indexing.


Auto-question

Definition:
Auto-question leverages a chat model to generate natural language questions (e.g., who, what, why) from each chunk. These questions improve matching in FAQ-like scenarios by anticipating user inquiries.

Functionality:

Configuration:

Important Notes:

Usage Example:
For a technical support knowledge base with ~1,000 character chunks, setting Auto-question to 2 will generate two questions per chunk during indexing, improving user query matches.


Implementation Details


Interaction With Other System Components


Community Tips and Use Cases

The file includes a community-sourced table relating typical document types and volumes with recommended Auto-keyword and Auto-question slider values. This helps users new to the feature choose sensible starting points based on their document corpus.

Use Case / Scenario

Document Size/Length

Auto-keyword (0–30)

Auto-question (0–10)

Employee handbook (internal process guidance)

Small (<10 pages)

0

0

Customer service FAQs

Medium (10–100 pages)

3–7

1–3

Technical whitepapers

Large (>100 pages)

2–4

1–2

Contracts / Legal retrieval

Large (>50 pages)

2–5

0–1

Multi-repository layered documents

Many

Adjust as appropriate

Adjust as appropriate

Social media comment pool (multilingual, short texts)

Very large

8–12

0

Operational logs for troubleshooting

Very large

3–6

0

Marketing asset library (multilingual)

Medium

6–10

1–2

Training courses / eBooks

Large

2–5

1–2

Maintenance manual (equipment diagrams + steps)

Medium

3–7

1–2


Visual Diagram

Below is a flowchart illustrating the main functions and their relationships in the auto-keyword and auto-question generation workflow during knowledge base document ingestion:

flowchart TD
    A[Document ingestion] --> B[Chunking Method]
    B --> C{Auto-keyword enabled?}
    C -- Yes --> D[Send chunk to Chat Model for Keywords]
    C -- No --> E[Skip Keyword Generation]
    B --> F{Auto-question enabled?}
    F -- Yes --> G[Send chunk to Chat Model for Questions]
    F -- No --> H[Skip Question Generation]
    D --> I[Store Keywords with Chunk Metadata]
    G --> J[Store Questions with Chunk Metadata]
    I --> K[Index Chunk with Enriched Metadata]
    J --> K
    E --> K
    H --> K
    K --> L[Enhanced Retrieval Engine]

Summary


This documentation assists users, administrators, and developers in understanding and configuring the auto-generation capabilities within RAGFlow’s knowledge base indexing process.