quickstart.mdx

Overview

The quickstart.mdx file serves as the official Quick Start Guide for RAGFlow, an open-source Retrieval-Augmented Generation (RAG) engine designed for deep document understanding and truthful question-answering backed by citations. This document provides step-by-step instructions for new users to:

Start a local RAGFlow server using Docker,
Create and configure knowledge bases,
Upload and parse files into datasets,
Intervene and customize file parsing,
Set up an AI chat assistant based on created datasets.

It also includes important prerequisites, environment configurations, and notes on interacting with LLMs (Large Language Models), making it a comprehensive onboarding resource for users deploying RAGFlow for the first time.

File Structure and Content Explanation

This file is an .mdx file, a Markdown format extended with JSX components, commonly used in documentation sites such as Docusaurus. It combines static content with interactive UI components and code blocks. The key elements in this file include:

Markdown sections: Headings, bullet lists, important notes, warnings, and code blocks.
Custom components:
- Tabs and TabItem for OS-specific instructions,
- APITable for displaying image tags and descriptions in a table.

Major Sections

Introduction
- Brief description of RAGFlow and its capabilities.
- Outline of what the quick start covers.
- Important notes about platform support.
Prerequisites
- Hardware and software requirements.
- Links and tips for installing Docker.
- Optional: gVisor for sandboxed code execution.
Start up the server
- Detailed instructions to configure the system setting vm.max_map_count across Linux, macOS, and Windows.
- Steps to clone the repository and check out the specific release tag.
- Commands to start RAGFlow server using Docker Compose (CPU and GPU modes).
- Table describing different Docker image tags and their properties.
- Instructions to verify server startup by examining logs.
- How to access the running server via browser.
Configure LLMs
- Explanation that RAGFlow requires LLM integration.
- Guide to configure LLM providers and models within RAGFlow UI.
- Notes on subsidiary models.
Create your first knowledge base
- How to create and configure a knowledge base.
- Supported file types.
- Chunking templates and embedding model selection.
- File upload and parsing initiation.
- Links to troubleshooting FAQs for parsing issues.
Intervene with file parsing
- Methods to visualize and manually edit document chunks.
- Adding keywords or questions to chunks to improve retrieval ranking.
- How to run retrieval tests to verify chunking.
Set up an AI chat
- Instructions to create a chat assistant linked to knowledge bases.
- Configuration options for assistant behavior, prompt engine, and model settings.
- Notes on response handling to avoid hallucinations.
- Screenshot showing the chat interface.
Additional Notes
- Links to further API documentation for advanced integrations (HTTP and Python APIs).

Components and Their Usage

Tabs & TabItem

Used to present OS-specific instructions for setting vm.max_map_count.
Props:
- defaultValue: default active tab.
- values: array of objects with label and value for each tab.
Usage example:

<Tabs defaultValue="linux" values={[{label: 'Linux', value: 'linux'}, ...]}>
  <TabItem value="linux">
    {/* Linux instructions */}
  </TabItem>
  ...
</Tabs>

APITable

Custom component to render API or image tag tables.
Wraps Markdown table syntax.
Usage example:

<APITable>
| Column | Description |
|--------|-------------|
| v0.20.5 | Stable release |
</APITable>

Important Implementation Details

System Configuration Dependency: The file stresses the importance of setting vm.max_map_count to at least 262144 to avoid Elasticsearch connection errors. It provides OS-specific, detailed instructions to ensure this setting is applied and persistent.
Docker-Based Deployment: The quickstart relies on Docker Compose to run the RAGFlow system. It distinguishes between CPU and GPU deployment options, with clear commands and environment variable notes.
Embedding Model Consistency: Once an embedding model is selected for a knowledge base and used to parse files, it cannot be changed to maintain embedding vector space consistency.
File Parsing & Chunking: Users upload files which are parsed into chunks using chunk templates. The chunking results can be inspected and manually modified to improve retrieval effectiveness.
LLM Integration: RAGFlow itself is a RAG engine and requires integration with LLM providers (via API keys) for generating grounded answers.
UI-Centric Workflow: The guide revolves around interactions within the RAGFlow web UI, guiding users through dataset creation, file upload, parsing, chunk intervention, and chat assistant creation.

Interactions with Other System Components

Docker Images: The guide references different RAGFlow Docker images (v0.20.5, v0.20.5-slim, nightly, nightly-slim), indicating how this file helps users pick and deploy the correct image for their needs.
Elasticsearch / Infinity: The file mentions Elasticsearch or Infinity as the underlying recall engine, which depends on system configuration (vm.max_map_count).
LLM Providers: Users configure external LLM models via API keys within the RAGFlow UI to empower the RAG engine.
File Parsing Pipeline: Uploaded files are processed into datasets and chunked for retrieval. This file describes how to manage that pipeline.
APIs: Although not detailed here, the file links to HTTP and Python API documentation for integrating RAGFlow programmatically.

Usage Examples

Starting the RAGFlow Server on Linux (CPU)

sudo sysctl -w vm.max_map_count=262144

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
git checkout -f v0.20.5

docker compose -f docker-compose.yml up -d

docker logs -f ragflow-server

Creating a Knowledge Base and Uploading Files

Navigate in UI: Dataset tab → Create dataset → Name dataset → Select embedding model and chunk template.
Upload local files → Click play button to parse.

Adding Keywords to a Chunk

Open parsed file → Hover over chunk → Double-click text → Add keywords/questions → Save.

Setting up AI Chat Assistant

Chat tab → Create an assistant → Name assistant → Select knowledge bases → Configure prompt and model settings → Start chatting.

Visual Diagram

flowchart TD
    A[Start RAGFlow Server]
    A --> B[Configure vm.max_map_count]
    B --> C[Clone GitHub Repo]
    C --> D[Run Docker Compose]
    D --> E[Access RAGFlow UI]

    E --> F[Configure LLM Providers]
    E --> G[Create Knowledge Base]
    G --> H[Upload Files]
    H --> I[Parse Files into Chunks]
    I --> J[Intervene & Edit Chunks]
    J --> K[Run Retrieval Tests]

    E --> L[Create AI Chat Assistant]
    L --> M[Select Knowledge Bases]
    L --> N[Configure Prompt & Model]
    L --> O[Start Chatting]

    subgraph Server Setup
        A --> B --> C --> D --> E
    end

    subgraph Knowledge Base Workflow
        G --> H --> I --> J --> K
    end

    subgraph Chat Setup
        L --> M
        L --> N
        L --> O
    end

Summary

The quickstart.mdx file is a detailed onboarding guide that walks users through deploying RAGFlow, configuring system prerequisites, creating knowledge bases, parsing documents, and launching AI chat assistants. It blends static documentation with interactive UI components and commands to provide a smooth user experience. This file is critical for new users looking to quickly get RAGFlow up and running and become productive with its core features.