RAGFlow’s Seamless Upgrade - from 0.21 to 0.22 and Beyond

November 19, 2025 · 3 min read

Background

From version 0.22.0, RAGFlow no longer ships a full Docker image with built-in embedding models. Previously, some users relied on the bundled embedding models in that image to build their datasets.

After upgrading to 0.22.0, those models are no longer available, which leads to several issues: the embedding model originally used by the dataset is missing; you cannot add new documents; retrieval in the dataset stops working properly; and you cannot switch to a new embedding model because of the old logic constraints. To address these compatibility problems after upgrade, we introduced important improvements in version 0.22.1.

0.22.1 capabilities

Datasets containing parsed data are allowed to switch embedding models

Starting from RAGFlow 0.22.1, a safer, automated embedding compatibility check is introduced, allowing users to switch embedding models on data-containing datasets. To ensure the new embedding model does not disrupt the original vector space structure, RAGFlow performs the following checks:

Sample extraction: Randomly selects a few chunks (e.g., 5–10) from the current dataset as representative samples.
Re-encoding: Generates new vectors for the sampled chunks using the new embedding model chosen by the user.
Similarity calculation: For each chunk, calculates the cosine similarity between new and old vectors.
Switch decision: If the average similarity is 0.9 or above, the new and old models are deemed sufficiently consistent in vector space, and the switch is allowed. If below 0.9, the model switch request is denied.

Why use a 0.9 threshold?

The threshold is set to 0.9 because models with the same name from different model providers can have minor version differences, and RAGFlow’s embeddings also vary with strategies and parameters, so a new model cannot perfectly reproduce the old embedding environment. These “small differences” still usually give an average similarity above 0.9, so 0.9 works well as a cut-off for “safe to swap” models. In contrast, embeddings from completely different model families (for example, MiniLM to BGE‑M3) tend to sit around 0.3–0.6 in similarity, so they fall below this threshold and are correctly blocked, preventing a scrambled vector space.

How to switch embedding model

Configure a new model in the model settings interface to replace the unusable default model.

Navigate to the dataset's Configuration page, select the same model name from the new provider, and wait for the model switch to complete.

If switching the embedding model fails, an error message will appear.

Enter Retrieval testing to self-assess the new embedding model.

Functions involving dataset retrieval in chat apps for example are now working properly.

Our future releases will feature more sophisticated upgrade tools and automation, simplifying migration from older versions and reducing the maintenance burden for users.

Google Drive Data source Guide

November 12, 2025 · 2 min read

1. Create a Google Cloud Project

You can either create a dedicated project for RAGFlow or use an existing Google Cloud external project.

Steps:

Open the project creation page
https://console.cloud.google.com/projectcreate
Select External as the Audience
Click Create

Go to APIs & Services → OAuth consent screen
Ensure User Type = External
Add your test users under Test Users by entering email addresses

3. Create OAuth Client Credentials

Navigate to:
https://console.cloud.google.com/auth/clients
Create a Web Application
Enter a name for the client
Add the following Authorized Redirect URIs:

http://localhost:9380/v1/connector/google-drive/oauth/web/callback

If using Docker deployment:

Authorized JavaScript origin:

http://localhost:80

placeholder-image

If running from source:

Authorized JavaScript origin:

http://localhost:9222

placeholder-image 5. After saving, click Download JSON. This file will later be uploaded into RAGFlow.

placeholder-image

4. Add Scopes

Open Data Access → Add or remove scopes
Paste and add the following entries:

https://www.googleapis.com/auth/drive.readonly
https://www.googleapis.com/auth/drive.metadata.readonly
https://www.googleapis.com/auth/admin.directory.group.readonly
https://www.googleapis.com/auth/admin.directory.user.readonly

placeholder-image 3. Update and Save changes

placeholder-image

5. Enable Required APIs

Navigate to the Google API Library:
https://console.cloud.google.com/apis/library placeholder-image

Enable the following APIs:

Google Drive API
Admin SDK API
Google Sheets API
Google Docs API

placeholder-image

6. Add Google Drive As a Data Source in RAGFlow

Go to Data Sources inside RAGFlow
Select Google Drive
Upload the previously downloaded JSON credentials
Enter the shared Google Drive folder link (https://drive.google.com/drive), such as:
Click Authorize with Google A browser window will appear. Click: - Continue - Select All → Continue - Authorization should succeed - Select OK to add the data source

RAGFlow 0.22.0 Overview — Supported Data Sources, Enhanced Parser, Agent Optimizations, and Admin UI

November 11, 2025 · 6 min read

0.22 Highlights

Building a RAGFlow dataset involves three main steps: file upload, parsing, and chunking. Version 0.21.0 made the parsing and chunking stages more flexible with the Ingestion pipeline.

This 0.22.0 release focuses on the data upload step to help developers build datasets faster.

We also added these key improvements:

The Parser component in the Ingestion pipeline now offers more model choices for better file parsing.
We optimized the Agent's Retrieval and Await response components.
A new Admin UI gives you a clearer and easier way to manage the system.

Support for Rich External Data Sources

The new Data Sources module lets you connect external data to a Dataset. You can now sync files from different places directly into RAGFlow.

Use the "Data Sources" menu in your personal center to add and set up sources like Confluence, AWS S3, Google Drive, Discord, and Notion. This lets you manage all your data in one place and sync it automatically.

Example: S3 Configuration

Make sure you have an S3 storage bucket in your AWS account.

Add your S3 details to the S3 data source form.

After you add it, click the settings icon to see the data source details.

If you set "Refresh Freq" to "1", the system will check for new files every minute.
RAGFlow watches your specified S3 Bucket (like ragflow-bucket). If it finds new files, it immediately starts syncing them.
After syncing, it waits one minute before checking again. Use the "Pause" button to turn this automatic refresh on or off anytime.

Linking Data Sources to a dataset

Create a new dataset (for example, TEST_S3).
Click Configuration and go to the bottom of the page.
Click Link Data Source and pick the data source you want (like S3).

After you link successfully, you'll see three icons:

Rebuild: Click this to delete all files and logs in the dataset and import everything again.
Settings: Check the sync logs here.
Unlink: This disconnects the data source. It keeps all the files already in the dataset but stops new syncs.

Status Messages in Logs:

Scheduled: The task is in line, waiting for its next turn to check for files.
Running: The system is moving files right now.
Success: It finished checking for new files.
Failed: The upload didn't work. Check the error message for details.
Cancel: You paused the transfer.

You can link multiple data sources to one dataset, and one data source can feed into many datasets.

Enhanced Parser

MinerU

RAGFlow now works with MinerU 2.6.3 as another option for parsing PDFs. It supports different backends like pipeline, vlm-transformers, vlm-vlm-engine, and http-client.

The idea is simple: RAGFlow asks MinerU to parse a file, reads the results, and adds them to your dataset.

Key Environment Variables:

Variable	Explanation	Default	Example
`MINERU_EXECUTABLE`	Path to MinerU on your computer	`mineru`	`MINERU_EXECUTABLE=/home/ragflow/uv_tools/.venv/bin/mineru`
`MINERU_DELETE_OUTPUT`	Keep or delete MinerU's output files?	`1` (delete)	`MINERU_DELETE_OUTPUT=0` (keep)
`MINERU_OUTPUT_DIR`	Where to put MinerU's output	System temp folder	`MINERU_OUTPUT_DIR=/home/ragflow/mineru/output`
`MINERU_BACKEND`	Which MinerU backend to use	`pipeline`	`MINERU_BACKEND=vlm-transformers`

Starting Up:

If you use the vlm-http-client backend, set the server address with MINERU_SERVER_URL.
To connect to a remote MinerU parser, use MINERU_APISERVER to give its address.

How to Start:

From Source: Install MinerU by itself (its dependencies can conflict with RAGFlow's). Then set the environment variables and start the RAGFlow server.
Using Docker: Set USE_MINERU=true in docker/.env and restart your containers.

Docling

RAGFlow also supports Docling as another PDF parser. It works the same way as MinerU.

Docling finds the text, formulas, tables, and images in a document. RAGFlow then uses what Docling finds.

What Docling Can Do:

Pull out text (paragraphs, headings, lists).
Extract math formulas.
Identify tables and images (and save them).
Mark where everything is located.

Starting Up: Set USE_DOCLING=true in docker/.env and restart your containers.

Agent Optimizations

Retrieval Now Uses Metadata

You can now add tags (metadata) to files in your dataset. During retrieval, the Agent can use these tags to filter results, so it only looks at specific files instead of the whole library.

Example: Imagine a dataset full of AI papers. Some are about AI agents, others are about evaluating AI. If you want a Q&A assistant that only answers evaluation questions, you can add a tag like "Topic": "Evaluation" to the right papers. When the Agent retrieves information, it will filter for just those tagged files.

Before, this only worked in the Chat app. Now the Agent's Retrieval component can do it too.

Agent Teamwork Gets Better

You can now use an upstream Agent's output in the Await Response component's message.

The Old Way: The message in the Await Response component was always static text.

The New Way: You can insert dynamic content from earlier in the workflow, like a plan from a Planning Agent.

This is great for "Deep Research" agents or any time you need a human to check the work before continuing. It's also a key part of future improvements to the Ingestion Pipeline.

You can find this use case as a ready-to-use template in the agent template library.

Admin UI

This version adds a new Admin UI, a visual dashboard made for system administrators.

It takes jobs you used to do with commands and puts them in a simple interface, making management much easier.

See System Status Instantly

The Service Status dashboard shows the health of all core services. It lists their name, type, host, port, and status. If something goes wrong (like Elasticsearch times out), you can find the problem fast and copy the address to test it, without logging into different servers.

The UI also shows service details. You can see detailed logs and connection info (like database passwords) without ever touching a server command line. This makes fixing problems quicker and keeps the system more transparent and secure.

Manage Users Easily

The User Management section lets you create, enable, disable, reset passwords for, and delete users. You can quickly find users by email or nickname and see what datasets and agents they own.

Finale

RAGFlow 0.21.0 gave you a powerful Ingestion pipeline for your data. Now, RAGFlow 0.22.0 connects to all the places your data lives. Together, they help you break down "data silos" and gather everything in one spot to power your LLMs.

We also improved how Agents and people work together. Now you can step into the Agent's workflow and guide it, working as a team to get better, more accurate results than full automation alone.

We will keep adding more data sources, better parsers, and smarter pipelines to make RAGFlow the best data foundation for your LLM applications.

GitHub: https://github.com/infiniflow/ragflow

Reference:

https://ragflow.io/docs/dev/faq#how-to-use-mineru-to-parse-pdf-documents

RAGFlow in Practice - Building an Agent for Deep-Dive Analysis of Company Research Reports

October 30, 2025 · 16 min read

In the actual work of the investment research department of financial institutions, analysts are exposed to a vast amount of industry and company analysis reports, third-party research data, and real-time market dynamics on a daily basis, with diverse and scattered information sources. The job of financial analysts is to swiftly formulate clear investment recommendations based on the aforementioned information, such as specifically recommending which stocks to buy, how to adjust portfolio allocations, or predicting the next direction of an industry. Therefore, we have developed the "Intelligent Investment Research Assistant" to help financial analysts quickly organize information. It can automatically capture company data, integrate financial indicators, and compile research report viewpoints, enabling analysts to determine within minutes whether a stock is worth buying, eliminating the need to sift through piles of materials and allowing them to focus their time on genuine investment decision-making. To achieve this goal, we have designed a comprehensive technical process.

The technical solution revolves around a core business process:

When an analyst poses a question, the system identifies the company name or abbreviation from the question and retrieves the corresponding stock code with the assistance of a search engine. If identification fails, a prompt is returned directly with the company code. After successfully obtaining the stock code, the system retrieves the company's core financial indicators from data interfaces, organizes and formats the data, and generates a clear financial table. Building on this, intelligent analysis further integrates research report information: on one hand, it gathers the latest authoritative research reports and market viewpoints, and on the other hand, it retrieves relevant research report content from the internal knowledge base. Ultimately, these organized financial data and research report information are combined into a comprehensive response, facilitating analysts in quickly reviewing key indicators and core viewpoints.

The workflow after orchestration is as follows:

This case utilizes RAGFlow to implement a complete workflow, ranging from stock code extraction, to the generation of company financial statements, and finally to the integration and output of research report information.

The following sections will provide a detailed introduction to the implementation process of this solution.

1. Preparing the Dataset

1.1 Create a dataset

The dataset required for this example can be downloaded from Hugging Face Datasets1.

Create an "Internal Stock Research Report" dataset and import the corresponding dataset documents.

1.2 Parse documents

For the documents in the "Internal Stock Research Report" dataset, we have selected the parsing and slicing method called Paper.

Research report documents typically include modules such as abstracts, core viewpoints, thematic analyses, financial forecast tables, and risk warnings. The overall structure follows a more thesis-like logical progression rather than a strictly hierarchical table of contents. If sliced based on the lowest-level headings, it can easily disrupt the coherence between paragraphs and tables.

Therefore, RAGFlow is better suited to adopt the "Paper" slicing approach, using chapters or logical paragraphs as the fundamental units. This approach not only preserves the integrity of the research report's structure but also facilitates the model's quick location of key information during retrieval.

The preview of the sliced financial report is as follows:

2. Building the Intelligent Agent

2.1 Create an application.

After successful creation, the system will automatically generate a "Start" node on the canvas.

In the "Start" node, you can set the initial greeting of the assistant, for example: "Hello! I'm your stock research assistant."

2.2 Build the function of "Extract Stock Codes"

2.2.1 Agent extracts stock codes

Use an Agent node and attach a TavilySearch tool to identify stock names or abbreviations from the user's natural language input and return a unique standard stock code. When no match is found, uniformly output "Not Found."

In financial scenarios, users' natural language is often ambiguous. For example:

"Help me check the research report on Apple Inc."
"How is NVIDIA's financial performance?"
"What's the situation with the Shanghai Composite Index today?"

These requests all contain stock-related information, but the system can only further query financial reports, research reports, or market data after accurately identifying the stock code.

This is why we need an Agent with the function of "extracting stock codes."

Below is the system prompt for this Agent:

<role> 

Your responsibility is: to identify and extract the stock name or abbreviation from the user's natural language query and return the corresponding unique stock code. 

</role> 

<rules> 

1. Only one result is allowed: - If a stock is identified → return the corresponding stock code only; - If no stock is identified → return “Not Found” only. 

2. **Do not** output any extra words, punctuation, explanations, prefixes, suffixes, or newline prompts. 3. The output must strictly follow the <response_format>. </rules>

<response_format>
Output only the stock code (e.g., AAPL or 600519)
Or output “Not Found”
</response_format>

<response_examples>
User input: “Please check the research report for Apple Inc.” → Output: AAPL
User input: “How is the financial performance of Moutai?” → Output: 600519
User input: “How is the Shanghai Composite Index performing today?” → Output: Not Found
</response_examples>

<tools> - Tavily Search: You may use this tool to query when you're uncertain about the stock code. - If you're confident, there's no need to use the tool. 

</tools> 

<Strict Output Requirements> - Only output the result, no explanations, prompts, or instructions allowed. - The output can only be the stock code or “Not Found,” otherwise, it will be considered an incorrect answer.

 </Strict Output Requirements>

2.2.2 Conditional node for identifying stock codes

Use a conditional node to evaluate the output result of the previous Agent node and guide the process flow based on different outcomes:

If the output is a stock code: It indicates successful identification of the stock, and the process will proceed to the "Case1" branch.
If the output contains "Not Found": It indicates that no valid stock name was identified from the user's input, and the process will proceed to the "Else" branch, where it will execute a node for replying with an irrelevant message, outputting "Your query is not supported."

2.3 Build the "Company Financial Statements" feature

The data for this feature is sourced from financial data provided by Yahoo Finance. By calling this API, we obtain core financial data for specified stocks, including operating revenue, net profit, etc., which drives the generation of the "Company Financial Statements."

2.3.1 Yahoo Finance Tools: Request for Financial Data

By using the "Yahoo Finance Tools" node, select "Balance sheet" and pass the stockCode output by the upstream Agent as a parameter. This allows you to fetch the core financial indicators of the corresponding company.

The returned results contain key data such as total assets, total equity, and tangible book value, which are used to generate the "Company Financial Statements" feature.

2.3.2 Financial table generation by Code node

Utilize the Code node to perform field mapping and numerical formatting on the financial data returned by Yahoo Finance Tools through Python scripts, ultimately generating a Markdown table with bilingual indicator comparisons, enabling a clear and intuitive display of the "Company Financial Statements."

Code:

import re

def format_number(value: str) -> str:
    """Convert scientific notation or floating-point numbers to comma-separated numbers"""
    try:
        num = float(value)
        if num.is_integer():
            return f"{int(num):,}"  # If it's an integer, format without decimal places
        else:
            return f"{num:,.2f}"  # Otherwise, keep two decimal places and add commas
    except:
        return value  # Return the original value if it's not a number (e.g., — or empty)

def extract_md_table_single_column(input_text: str) -> str:
    # Use English indicators directly
    indicators = [
        "Total Assets", "Total Equity", "Tangible Book Value", "Total Debt", 
        "Net Debt", "Cash And Cash Equivalents", "Working Capital", 
        "Long Term Debt", "Common Stock Equity", "Ordinary Shares Number"
    ]
    
    # Core indicators and their corresponding units
    unit_map = {
        "Total Assets": "USD",
        "Total Equity": "USD",
        "Tangible Book Value": "USD",
        "Total Debt": "USD",
        "Net Debt": "USD",
        "Cash And Cash Equivalents": "USD",
        "Working Capital": "USD",
        "Long Term Debt": "USD",
        "Common Stock Equity": "USD",
        "Ordinary Shares Number": "Shares"
    }

    lines = input_text.splitlines()

    # Automatically detect the date column, keeping only the first one
    date_pattern = r"\d{4}-\d{2}-\d{2}"
    header_line = ""
    for line in lines:
        if re.search(date_pattern, line):
            header_line = line
            break

    if not header_line:
        raise ValueError("Date column header row not found")

    dates = re.findall(date_pattern, header_line)
    first_date = dates[0]  # Keep only the first date
    header = f"| Indicator | {first_date} |"
    divider = "|------------------------|------------|"

    rows = []
    for ind in indicators:
        unit = unit_map.get(ind, "")
        display_ind = f"{ind} ({unit})" if unit else ind

        found = False
        for line in lines:
            if ind in line:
                # Match numbers and possible units
                pattern = r"(nan|[0-9\.]+(?:[eE][+-]?\d+)?)"
                values = re.findall(pattern, line)
                # Replace 'nan' with '—' and format the number
                first_value = values[0].strip() if values and values[0].strip().lower() != "nan" else "—"
                first_value = format_number(first_value) if first_value != "—" else "—"
                rows.append(f"| {display_ind} | {first_value} |")
                found = True
                break
        if not found:
            rows.append(f"| {display_ind} | — |")

    md_table = "\n".join([header, divider] + rows)
    return md_table

def main(input_text: str):
    return extract_md_table_single_column(input_text)

We have also received requests from everyone expressing a preference not to extract JSON fields through coding, and we will gradually provide solutions in future versions.

2.4 Build the "Research Report Information Extraction" function

Utilize an information extraction agent, which, based on the stockCode, calls the AlphaVantage API to extract the latest authoritative research reports and insights. Meanwhile, it invokes the internal research report retrieval agent to obtain the full text of the complete research reports. Finally, it outputs the two parts of content separately in a fixed structure, thereby achieving an efficient information extraction function.

System prompt:

<role> 

You are the information extraction agent. You understand the user’s query and delegate tasks to alphavantage and the internal research report retrieval agent. 

</role> 

<requirements>

 1. Based on the stock code output by the "Extract Stock Code" agent, call alphavantage's EARNINGS_CALL_TRANSCRIPT to retrieve the latest information that can be used in a research report, and store all publicly available key details.


2. Call the "Internal Research Report Retrieval Agent" and save the full text of the research report output. 

3. Output the content retrieved from alphavantage and the Internal Research Report Retrieval Agent in full. 

</requirements>


<report_structure_requirements>
The output must be divided into two sections:
#1. Title: “alphavantage”
Directly output the content collected from alphavantage without any additional processing.
#2. Title: "Internal Research Report Retrieval Agent"
Directly output the content provided by the Internal Research Report Retrieval Agent.
</report_structure_requirements>

2.4.1 Configure the MCP tool

Add the MCP tool:

Add the MCP tool under the agent and select the required method, such as "EARNINGS_CALL_TRANSCRIPT".

2.4.2 Internal Research Report Retrieval Agent

The key focus in constructing the internal research report retrieval agent lies in accurately identifying the company or stock code in user queries. It then invokes the Retrieval tool to search for research reports from the dataset and outputs the full text, ensuring that information such as data, viewpoints, conclusions, tables, and risk warnings is not omitted. This enables high-fidelity extraction of research report content.

System Prompt:

<Task Objective> 

Read user input → Identify the involved company/stock (supports abbreviations, full names, codes, and aliases) → Retrieve the most relevant research reports from the dataset → Output the full text of the research report, retaining the original format, data, chart descriptions, and risk warnings. 

</Task Objective>

<Execution Rules> 

1. Exact Match: Prioritize exact matches of company full names and stock codes. 

2. Content Fidelity: Fully retain the research report text stored in the dataset without deletion, modification, or omission of paragraphs. 

3. Original Data: Retain table data, dates, units, etc., in their original form. 

4. Complete Viewpoints: Include investment logic, financial analysis, industry comparisons, earnings forecasts, valuation methods, risk warnings, etc. 

5. Merging Multiple Reports: If there are multiple relevant research reports, output them in reverse chronological order. 

6. No Results Feedback: If no matching reports are found, output “No related research reports available in the dataset.”

 </Execution Rules>

2.5 Add a Research Report Generation Agent

The research report generation agent automatically extracts and structurally organizes financial and economic information, generating foundational data and content for investment bank analysts that are professional, retain differentiation, and can be directly used in investment research reports.

<role> 

You are a senior investment banking (IB) analyst with years of experience in capital market research. You excel at writing investment research reports covering publicly listed companies, industries, and macroeconomics. You possess strong financial analysis skills and industry insights, combining quantitative and qualitative analysis to provide high-value references for investment decisions. 

**You are able to retain and present differentiated viewpoints from various reports and sources in your research, and when discrepancies arise, you do not merge them into a single conclusion. Instead, you compare and analyze the differences.** 


</role> 




<input> 

You will receive financial information extracted by the information extraction agent.

 </input>


<core_task>
Based on the content returned by the information extraction agent (no fabrication of data), write a professional, complete, and structured investment research report. The report must be logically rigorous, clearly organized, and use professional language, suitable for reference by fund managers, institutional investors, and other professional readers.
When there are differences in analysis or forecasts between different reports or institutions, you must list and identify the sources in the report. You should not select only one viewpoint. You need to point out the differences, their possible causes, and their impact on investment judgments.
</core_task>


<report_structure_requirements>
##1. Summary
Provide a concise overview of the company’s core business, recent performance, industry positioning, and major investment highlights.
Summarize key conclusions in 3-5 sentences.
Highlight any discrepancies in core conclusions and briefly describe the differing viewpoints and areas of disagreement.
##2. Company Overview
Describe the company's main business, core products/services, market share, competitive advantages, and business model.
Highlight any differences in the description of the company’s market position or competitive advantages from different sources. Present and compare these differences.
##3. Recent Financial Performance
Summarize key metrics from the latest financial report (e.g., revenue, net profit, gross margin, EPS).
Highlight the drivers behind the trends and compare the differential analyses from different reports. Present this comparison in a table.
##4. Industry Trends & Opportunities
Overview of industry development trends, market size, and major drivers.
If different sources provide differing forecasts for industry growth rates, technological trends, or competitive landscape, list these and provide background information. Present this comparison in a table.
##5. Investment Recommendation
Provide a clear investment recommendation based on the analysis above (e.g., "Buy/Hold/Neutral/Sell"), presented in a table.
Include investment ratings or recommendations from all sources, with the source and date clearly noted.
If you provide a combined recommendation based on different viewpoints, clearly explain the reasoning behind this integration.
##6. Appendix & References
List the data sources, analysis methods, important formulas, or chart descriptions used.
All references must come from the information extraction agent and the company financial data table provided, or publicly noted sources.
For differentiated viewpoints, provide full citation information (author, institution, date) and present this in a table.
</report_structure_requirements>


<output_requirements>
Language Style: Financial, professional, precise, and analytical.
Viewpoint Retention: When there are multiple viewpoints and conclusions, all must be retained and compared. You cannot choose only one.
Citations: When specific data or viewpoints are referenced, include the source in parentheses (e.g., Source: Morgan Stanley Research, 2024-05-07).
Facts: All data and conclusions must come from the information extraction agent or their noted legitimate sources. No fabrication is allowed.
Readability: Use short paragraphs and bullet points to make it easy for professional readers to grasp key information and see the differences in viewpoints.
</output_requirements>


<output_goal>
Generate a complete investment research report that meets investment banking industry standards, which can be directly used for institutional investment internal reference, while faithfully retaining differentiated viewpoints from various reports and providing the corresponding analysis.
</output_goal>



<heading_format_requirements>
All section headings in the investment research report must be formatted as N. Section Title (e.g., 1. Summary, 2. Company Overview), where:
The heading number is followed by a period and the section title.
The entire heading (number, period, and title) is rendered in bold text (e.g., using <b> in HTML or equivalent bold formatting, without relying on Markdown ** syntax).
Do not use ##, **, or any other prefix before the heading number.
Apply this format consistently to all section headings (Summary, Company Overview, Recent Financial Performance, Industry Trends & Opportunities, Investment Recommendation, Appendix & References).
</heading_format_requirements>

2.6 Add a Reply Message Node

The reply message node is used to output the "financial statements" and "research report content" that are the final outputs of the workflow.

2.7 Save and Test

Click "Save" - "Run" - and view the execution results. The entire process takes approximately 5 minutes to run. Execution Results:

Log: The entire process took approximately 5 minutes to run.

Summary and Outlook

This case study has constructed a complete workflow for stock research reports using RAGFlow, encompassing three core steps:

Utilizing an Agent node to extract stock codes from user inputs.
Acquiring and formatting company financial data through Yahoo Finance tools and Code nodes to generate clear financial statements.
Invoking information extraction agents and an internal research report retrieval agent, and using a research report generation agent to output the latest research report insights and the full text of complete research reports, respectively.

The entire process achieves automated handling from stock code identification to the integration of financial and research report information.

We observe several directions for sustainable development: More data sources can be incorporated to make analytical results more comprehensive, while providing a code-free method for data processing to lower the barrier to entry. The system also has the potential to analyze multiple companies within the same industry, track industry trends, and even cover a wider range of investment instruments such as futures and funds, thereby assisting analysts in forming superior investment portfolios. As these features are gradually implemented, the intelligent investment research assistant will not only help analysts make quicker judgments but also establish an efficient and reusable research methodology, enabling the team to consistently produce high-quality analytical outputs.

RAGFlow Named Among GitHub’s Fastest-Growing Open Source Projects, Reflecting Surging Demand for Production-Ready AI

October 28, 2025 · 3 min read

The release of GitHub’s 2025 Octoverse report marks a pivotal moment for the open source ecosystem—and for projects like RAGFlow, which has emerged as one of the fastest-growing open source projects by contributors this year. With a remarkable 2,596% year-over-year growth in contributor engagement, RAGFlow isn’t just gaining traction—it’s defining the next wave of AI-powered development.

The Rise of Retrieval-Augmented Generation in Production

As the Octoverse report highlights, AI is no longer experimental—it’s foundational. More than 4.3 million AI-related repositories now exist on GitHub, and over 1.1 million public repos import LLM SDKs, a 178% YoY increase. In this context, RAGFlow’s rapid adoption signals a clear shift: developers are moving beyond prototyping and into production-grade AI workflows.

RAGFlow—an end-to-end retrieval-augmented generation engine with built-in agent capabilities—is perfectly positioned to meet this demand. It enables developers to build scalable, context-aware AI applications that are both powerful and practical. As the report notes, “AI infrastructure is emerging as a major magnet” for open source contributions, and RAGFlow sits squarely at the intersection of AI infrastructure and real-world usability.

Why RAGFlow Resonates in the AI Era

Several trends highlighted in the Octoverse report align closely with RAGFlow’s design and mission:

From Notebooks to Production: The report notes a shift from Jupyter Notebooks (+75% YoY) to Python codebases, signaling that AI projects are maturing. RAGFlow supports this transition by offering a structured, reproducible framework for deploying RAG systems in production.
Agentic Workflows Are Going Mainstream: With the launch of GitHub Copilot coding agent and the rise of AI-assisted development, developers are increasingly relying on tools that automate complex tasks. RAGFlow’s built-in agent capabilities allow teams to automate retrieval, reasoning, and response generation—key components of modern AI apps.
Security and Scalability Are Top of Mind: The report also highlights a 172% YoY increase in Broken Access Control vulnerabilities, underscoring the need for secure-by-design AI systems. RAGFlow’s focus on enterprise-ready deployment helps teams address these challenges head-on.

A Project in Active Development

RAGFlow's evolution mirrors a deliberate journey—from solving foundational RAG challenges to shaping the next generation of enterprise AI infrastructure.

The project first made its mark by systematically addressing core RAG limitations through integrated technological innovation. With features such as deep document understanding for parsing complex formats, hybrid retrieval that blends multiple search strategies, and built-in advanced tools like GraphRAG and RAPTOR, RAGFlow established itself as an end-to-end solution that dramatically enhances retrieval accuracy and reasoning performance.

Now, building on this robust technical foundation, RAGFlow is steering toward a bolder vision: to become the superior context engine for enterprise-grade Agents. Evolving from a specialized RAG engine into a unified, resilient context layer, RAGFlow is positioning itself as the essential data foundation for LLMs in the enterprise—enabling Agents of any kind to access rich, precise, and secure context, ensuring reliable and effective operation across all tasks.

RAGFlow is an open source retrieval-augmented generation engine designed for building production-ready AI applications. To learn more or contribute, visit the RAGFlow GitHub repository.

This post was inspired by insights from the GitHub Octoverse 2025 Report. Special thanks to the GitHub team for amplifying the voices of open source builders everywhere.

Is data processing like building with lego? Here is a detailed explanation of the ingestion pipeline.

October 23, 2025 · 18 min read

Since its open-source release, RAGFlow has consistently garnered widespread attention from the community. Its core module, DeepDoc, leverages built-in document parsing models to provide intelligent document-sharding capabilities tailored for multiple business scenarios, ensuring that RAGFlow can deliver accurate and high-quality answers during both the retrieval and generation phases. Currently, RAGFlow comes pre-integrated with over a dozen document-sharding templates, covering various business scenarios and file types.

However, as RAGFlow becomes increasingly widely adopted in production environments, the original dozen-plus fixed sharding methods have struggled to keep pace with the complex and diverse array of data sources, document structures, and file types encountered. Specific challenges include:

The need to flexibly configure different parsing and sharding strategies based on specific business scenarios to accommodate varied document structures and content logic.
Document parsing and ingestion involve not only segmenting unstructured data into text blocks but also encompass a series of critical preprocessing steps to bridge the "semantic gap" during RAG retrieval. This often requires leveraging models to enrich raw content with semantic information such as summaries, keywords, and hierarchical structures.
In addition to locally uploaded files, a significant amount of data, files, and knowledge originate from various sources, including cloud drives and online services.
With the maturation of multimodal vision-language models (VLMs), models like MinerU and Docling, which excel in parsing documents with complex layouts, tables, and mixed text-image arrangements, have emerged. These models demonstrate unique advantages across various application scenarios.

To address these challenges, RAGFlow 0.21.0 has introduced a groundbreaking Ingestion pipeline. This pipeline restructures the cleaning process for unstructured data, allowing users to construct customized data-processing pipelines tailored to specific business needs and enabling precise parsing of heterogeneous documents.

The Ingestion Pipeline is essentially a visual ETL process tailored for unstructured data. Built upon an Agent foundation, it restructures a typical RAG data ingestion workflow—which usually encompasses key stages such as document parsing, text chunking, vectorization, and index construction—into three distinct phases: Parser, Transformer, and Indexer. These phases correspond to document parsing, data transformation, and index construction, respectively.

Document Parsing: As a critical step in data cleaning, this module integrates multiple parsing models, with DeepDoc being a representative example. It transforms raw unstructured data into semi-structured content, laying the groundwork for subsequent processing.
Data Transformation: Currently offering two core types of operators, including Chunker and Transformer, this phase aims to further process the cleaned data into formats suitable for various index access methods, thereby ensuring high-quality recall performance.
Index Construction: Responsible for the final data write-in. RAGFlow inherently adopts a multi-path recall architecture to guarantee retrieval effectiveness. Consequently, the Indexer incorporates multiple indexing methods, allowing users to configure them flexibly.

Below, we will demonstrate the construction and use of the Ingestion Pipeline through a specific example.

First, click on "Create agent" on the "Agent" page. You can choose "Create from blank" to create an Ingestion Pipeline from scratch:

Alternatively, you can select "Create from template" to utilize a pre-configured Ingestion pipeline template:

Next, we will begin to arrange various operators required for the Pipeline. When creating from scratch, only the Begin and Parser operators will be displayed on the initial canvas. Subsequently, you can drag and connect additional operators with different functions from the right side of the existing operators.

First, it is necessary to configure the Parser operator.

Parser

The Parser operator is responsible for reading and parsing documents: identifying their layouts, extracting structural and textual information from them, and ultimately obtaining structured document data.

This represents a "high-fidelity, structured" extraction strategy. The Parser intelligently adapts to and preserves the original characteristics of different files, whether it's the hierarchical outline of a Word document, the row-and-column layout of a spreadsheet, or the complex layout of a scanned PDF. It not only extracts the main text but also fully retains auxiliary information such as titles, tables, headers, and footers, transforming them into appropriate data forms, which will be detailed below. This structured differentiation is crucial, providing the necessary foundation for subsequent refined processing.

Currently, the Parser operator supports input from 8 major categories encompassing 23 file types, summarized as follows:

When in use, simply click "Add Parser" within the Parser node and select the desired file category (such as PDF, Image, or PPT). When the Ingestion pipeline is running, the Parser node will automatically identify the input file and route it to the corresponding parser for parsing.

Here, we provide further explanations for the parsers of several common file categories:

For PDF files, RAGFlow offers multiple parsing model options, with a unified output in JSON format:
1. Default DeepDoc: This is RAGFlow's built-in document understanding model, capable of recognizing layout, columns, and tables. It is suitable for processing scanned documents or those with complex formatting.
2. MinerU: Currently an outstanding document parsing model in the industry. Besides parsing complex document content and layouts, MinerU also provides excellent parsing for complex file elements such as mathematical formulas.
3. Naive: A pure text extraction method without using any models. It is suitable for documents with no complex structure or non-textual elements.

For Image files, the system will by default invoke OCR to extract text from the image. Additionally, users can also configure VLMs (Vision Language Models) that support visual recognition to process them.
For Audio files, it is necessary to configure a model that supports speech-to-text conversion. The Parser will then extract the textual content from the Audio. Users can configure the API keys of model providers that support this type of parsing on the "Model provider" page of the homepage. After that, they can return to the Parser node and select it from the dropdown menu. This "configure first, then select" logic also applies to PDF, Image, and Video files.
For Video files, it is necessary to configure a large model that supports multimodal recognition. The Parser will invoke this model to conduct a comprehensive analysis of the video and output the results in text format.
When parsing Email files, RAGFlow provides Field options, allowing users to select only the desired fields, such as "subject" and "body." The Parser will then precisely extract the textual content of these fields.

The Spreadsheet parser will output the file in HTML format, preserving its row and column structure intact to ensure that the tabular data remains clear and readable after conversion.
Files of Word and PPT types will be parsed and output in JSON format. For Word files, the original hierarchical structure of the document, such as titles, paragraphs, lists, headers, and footers, will be retained. For PPT files, the content will be extracted page by page, distinguishing between the title, main text, and notes of each slide.
The Text & Markup category will automatically strip formatting tags from files such as HTML and MD (Markdown), outputting only the cleanest text content.

Chunker

The Chunker node is responsible for dividing the documents output by upstream nodes into Chunk segments. Chunk is a concept introduced by RAG technology, representing the unit of recall. Users can choose whether to add a Chunker node based on their needs, but it is generally recommended to use it for two main reasons:

If the entire document is used as the unit for recall, the data passed to the large model during the final generation phase may exceed the context window limit.
In typical RAG systems, vector search serves as an important recall method. However, vectors inherently have issues with inaccurate semantic representation. For example, users can choose to convert a single sentence into a vector or the entire document into a vector. The former loses global semantic information, while the latter loses local information. Therefore, selecting an appropriate segment length to achieve a relatively good balance when represented by a single vector is an essential technical approach.

In practical engineering systems, how the Chunk segmentation results are determined often significantly impacts the recall quality of RAG. If content containing the answer is split across different Chunks, and the Retrieval phase cannot guarantee that all these Chunks are recalled, it can lead to inaccurate answer generation and hallucinations. Therefore, the Ingestion pipeline introduces the Chunker node, allowing users to slice text more flexibly.

The current system incorporates two built-in segmentation methods: segmentation based on text tokens and titles.

Segmentation by tokens is the most common approach. Users can customize the size of each segment, with a default setting of 512 tokens, which represents a balance optimized for retrieval effectiveness and model compatibility. When setting the segment size, trade-offs must be considered: if the segment token count remains too large, portions exceeding the model's limit will still be discarded; if set too small, it may result in excessive segmentation of coherent semantics in the original text, disrupting context and affecting retrieval effectiveness.

To address this, the Chunker operator provides a segment overlap feature, which allows the end portion of the previous segment to be duplicated as the beginning of the next segment, thereby enhancing semantic continuity. Users can increase the "Overlapped percent" to improve the correlation between segments.

Additionally, users can further optimize segmentation rules by defining "Separators." The system defaults to using \n (newline characters) as separators, meaning the segmenter will prioritize attempting to split along natural paragraphs rather than abruptly truncating sentences in the middle, as shown in the following figure.

If the document itself has a clear chapter structure, segmenting the text by tokens may not be the optimal choice. In such cases, the Title option in the Chunker can be selected to slice the document according to its headings. This method is primarily suitable for documents with layouts such as technical manuals, academic papers, and legal clauses. We can customize expressions for headings at different levels in the Title node. For example, the expression for a first-level heading (H1) can be set as ^#[^#], and for a second-level heading (H2) as ^##[^#]. Based on these expressions, the system will strictly slice the document according to the predefined chapter structure, ensuring that each segment represents a structurally complete "chapter" or "subsection." Users can also freely add or reduce heading levels in the configuration to match the actual structure of the document, as shown in the following figure.

Note: In the current RAGFlow v0.21 version, if both the Token and Title options of the Chunker are configured simultaneously, please ensure that the Title node is connected after the Token node. Otherwise, if the Title node is directly connected to the Parser node, format errors may occur for files of types Email, Image, Spreadsheet, and Text&Markup. These limitations will be optimized in subsequent versions.

Transformer

The Transformer operator is responsible for transforming textual content. Simply resolving the accuracy of text parsing and segmentation does not guarantee the final retrieval accuracy. This is because there is always a so-called "semantic gap" between a user's query and the documents containing the answers. By using the Transformer operator, users can leverage large models to extract information from the input document content, thereby bridging the "semantic gap."

The current Transformer operator supports functions such as summary generation, keyword generation, question generation, and metadata generation. Users can choose to incorporate this operator into their pipeline to supplement the original data with these contents, thereby enhancing the final retrieval accuracy. Similar to other scenarios involving large models, the Transformer node also offers three modes for the large model: Improvise, Precise, and Balance.

Improvise (Improvisational Mode) encourages the model to exhibit greater creativity and associative thinking, making it well-suited for generating diverse Questions.
Precise (Precise Mode) strictly constrains the model to ensure its output is highly faithful to the original text, making it suitable for generating Summaries or extracting Keywords.
Balance (Balanced Mode) strikes a balance between the two, making it applicable to most scenarios.

Users can select one of these three styles or achieve finer control by adjusting parameters such as Temperature and Top P on their own.

The Transformer node can generate four types of content: Summary, Keywords, Questions, and Metadata. RAGFlow also makes the prompts for each type of content openly accessible, which means users can enrich and personalize text processing by modifying the system prompts.

If multiple functions need to be implemented, such as summarizing content and extracting keywords simultaneously, users are required to configure a separate Transformer node for each function and connect them in series within the Pipeline. In other words, a Transformer node can be directly connected after the Parser to process the entire document (e.g., generating a full-text summary), or it can be connected after the Chunker to process each text segment (e.g., generating questions for each Chunk). Additionally, a Transformer node can be connected after another Transformer node to perform complex content extraction and generation in a cascaded manner.

Please note: The Transformer node does not automatically acquire content from its preceding nodes. The actual source of information it processes entirely depends on the variables referenced in the User prompt. In the User prompt, variables output by upstream nodes must be manually selected and referenced by entering the / symbol.

For example, in a Parser - Chunker - Transformer pipeline, even though the Transformer is visually connected after the Chunker, if the variable referenced in the User prompt is the output from the Parser node, then the Transformer will actually process the entire original document rather than the chunks generated by the Chunker.

Similarly, when users choose to connect multiple Transformer nodes in series (e.g., the first one generates a Summary, and the second one generates Keywords), if the second Transformer references the Summary variable generated by the first Transformer, it will process this Summary as the "new original text" for further processing, rather than handling the text chunks from the more upstream Chunker.

Indexer

The preceding Parser, Chunker, and Transformer nodes handle data inflow, segmentation, and optimization. However, the final execution unit in the Pipeline is the Indexer node, which is responsible for writing the processed data into the Retrieval engine (RAGFlow currently supports Infinity, Elasticsearch, and OpenSearch).

The core capability of the Retrieval engine is to establish various types of indexes for data, thereby providing search capabilities, including vector indexing, full-text indexing, and future tensor indexing capabilities, among others. In other words, it is the ultimate embodiment of "Retrieval" in the term RAG.

Due to the varying capabilities of different types of indexes, the Indexer simultaneously offers options for creating different indexes. Specifically, the Search method option within the Indexer node determines how user data can be "found."

Full-text refers to traditional "keyword search," which is an essential option for precise recall, such as searching for a specific product number, person's name, or code. Embedding, on the other hand, represents modern AI-driven "semantic search." Users can ask questions in natural language, and the system can "understand" the meaning of the question and retrieve the most relevant document chunks in terms of content. Enabling both simultaneously for hybrid search is the default option in RAGFlow. Effective multi-channel recall can balance precision and semantics, maximizing the discovery of text segments where answers reside.

Note: There is no need to select a specific Embedding model within the Indexer node, as it will automatically invoke the embedding model set when creating the knowledge base. Additionally, the Chat, Search, and Agent functionalities in RAGFlow support cross-knowledge base retrieval, meaning a single question can be searched across multiple knowledge bases simultaneously. However, to enable this feature, a prerequisite must be met: all simultaneously selected knowledge bases must utilize the same Embedding model. This is because different embedding models convert the same text into completely different and incompatible vectors, preventing cross-base retrieval.

Furthermore, finer retrieval settings can be achieved through the Filename embedding weight and Field options. The Filename embedding weight is a fine-tuning slider that allows users to consider the "filename" of a document as part of the semantic information and assign it a specific weight.

The Field option, on the other hand, determines the specific content to be indexed and the retrieval strategy. Currently, three distinct strategy options are provided:

Processed Text: This is the default option and the most intuitive. It means that the Indexer will index the processed text chunks from preceding nodes.
Questions: If a Transformer node is used before the Indexer to generate "potential questions that each text chunk can answer," these Questions can be indexed here. In many scenarios, matching "user questions" with "pre-generated questions" yields significantly higher similarity than matching "questions" with "answers" (i.e., the original text), effectively improving retrieval accuracy.
Augmented Context: This involves using Summaries instead of the original text for retrieval. It is suitable for scenarios requiring quick broad topic matching without being distracted by the details of the original text.

Link the Ingestion Pipeline to the Knowledge Base

After constructing an Ingestion pipeline, the next step is to associate it with the corresponding knowledge base. On the page for creating a knowledge base, locate and click the "Choose pipeline" option under the "Ingestion pipeline" tab. Subsequently, select the already created Pipeline from the dropdown menu to establish the association. Once set, this Pipeline will become the default file parsing process for this knowledge base.

For an already created knowledge base, users can enter its "Ingestion pipeline" module at any time to readjust and reselect the associated parsing process.

If users wish to adjust the Ingestion pipeline for a single file, they can also do so by clicking on the Parse location to make adjustments.

Finally, make adjustments and save the updates in the pop-up window.

Logs

The execution of an Ingestion Pipeline may take a considerable amount of time, making observability an indispensable capability. To this end, RAGFlow provides a log panel for the Ingestion pipeline, which records the full-chain logs for each file parsing operation. For files parsed through the Ingestion Pipeline, users can delve into the operational details of each processing node. This offers comprehensive data support for subsequent issue auditing, process debugging, and performance insights.

The following image is an example diagram of step logs.

Case Reference

When creating a Pipeline, you can select the "Chunk Summary" template as the foundation for construction, although you also have the option to choose other templates as starting points for subsequent building.

The orchestration design of the "Chunk Summary" template is as follows:

Next, switch the large model in the Transformer node to the desired model and set the "Result destination" field to "Metadata." This configuration means that the processing results (such as summaries) of the large model on text chunks will be directly stored in the file's metadata, providing capability support for subsequent precise retrieval and filtering.

Click Run in the top right corner and upload a file to test the pipeline:

You can click View result to check the test run results:

Summary

The above content provides a comprehensive introduction to the usage methods and core capabilities of the current Ingestion pipeline. Looking ahead, RAGFlow will continue to evolve in terms of data import and parsing clarity, including the following specific enhancements:

Expanding Data Source Support: In addition to local file uploads, we will gradually integrate various data sources such as S3, cloud storage, email, online notes, and more. Through automatic and manual synchronization mechanisms, we will achieve seamless data import into knowledge bases, automatically applying the Ingestion pipeline to complete document parsing.
Enhancing Parsing Capabilities: We will integrate more document parsing models, such as Docling, into the Parser operator to cover parsing needs across different vertical domains, comprehensively improving the quality and accuracy of document information extraction.
Opening Up Custom Slicing Functionality: In addition to built-in Chunker types, we will gradually open up underlying slicing capabilities, allowing users to write custom slicing logic for more flexible and controllable text organization.
Strengthening the Extensibility of Semantic Enhancement: Building on existing capabilities for summarization, keyword extraction, and question generation, we will further support customizable semantic enhancement strategies in a programmable manner, providing users with more technical means to optimize retrieval and ranking.

Through these enhancements, RAGFlow will continuously improve retrieval accuracy, serving as a robust support for providing high-quality context to large models.

Finally, we appreciate your attention and support. We welcome you to star RAGFlow on GitHub and join us in witnessing the continuous evolution of large model technology!

GitHub: https://github.com/infiniflow/ragflow

Bid Farewell to Complexity — RAGFlow CLI Makes Back-end Management a Breeze

October 20, 2025 · 13 min read

To meet the refined requirements for system operation and maintenance monitoring as well as user account management, especially addressing practical issues currently faced by RAGFlow users, such as "the inability of users to recover lost passwords on their own" and "difficulty in effectively controlling account statuses," RAGFlow has officially launched a professional back-end management command-line tool.

This tool is based on a clear client-server architecture. By adopting the design philosophy of functional separation, it constructs an efficient and reliable system management channel, enabling administrators to achieve system recovery and permission control at the fundamental level.

The specific architectural design is illustrated in the following diagram:

Through this tool, RAGFlow users can gain a one-stop overview of the operational status of the RAGFlow Server as well as components such as MySQL, Elasticsearch, Redis, MinIO, and Infinity. It also supports comprehensive user lifecycle management, including account creation, status control, password reset, and data cleanup.

Start the service

If you deploy RAGFlow via Docker, please modify the docker/docker_compose.yml file and add parameters to the service startup command.

command:
  - --enable-adminserver

If you are deploying RAGFlow from the source code, you can directly execute the following command to start the management server:

python3 admin/server/admin_server.py

After the server starts, it listens on port 9381 by default, waiting for client connections. If you need to use a different port, please modify the ADMIN_SVR_HTTP_PORT configuration item in the docker/.env file.

Install the client and connect to the management service

It is recommended to use pip to install the specified version of the client:

pip install ragflow-cli==0.21.0

Version 0.21.0 is the current latest version. Please ensure that the version of ragflow-cli matches the version of the RAGFlow server to avoid compatibility issues:

ragflow-cli -h 127.0.0.1 -p 9381

Here, -h specifies the server IP, and -p specifies the server port. If the server is deployed at a different address or port, please adjust the parameters accordingly.

For the first login, enter the default administrator password admin. After successfully logging in, it is recommended to promptly change the default password in the command-line tool to enhance security.

Command Usage Guide

By entering the following commands through the client, you can conveniently manage users and monitor the operational status of the service.

Interactive Feature Description

Supports using arrow keys to move the cursor and review historical commands.
Pressing Ctrl+C allows you to terminate the current interaction at any time.
If you need to copy content, please avoid using Ctrl+C to prevent accidentally interrupting the process.

Command Format Specifications

All commands are case-insensitive and must end with a semicolon ;.
Text parameters such as usernames and passwords need to be enclosed in single quotes ' or double quotes ".
Special characters like \, ', and " are prohibited in passwords.

Service Management Commands

LIST SERVICES;

List the operational status of RAGFlow backend services and all associated middleware.

Usage Example

admin> list services;
Listing all services
+-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+
| extra                                                                                     | host      | id | name          | port  | service_type   | status  |
+-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+
| {}                                                                                        | 0.0.0.0   | 0  | ragflow_0     | 9380  | ragflow_server | Timeout |
| {'meta_type': 'mysql', 'password': 'infini_rag_flow', 'username': 'root'}                 | localhost | 1  | mysql         | 5455  | meta_data      | Alive   |
| {'password': 'infini_rag_flow', 'store_type': 'minio', 'user': 'rag_flow'}                | localhost | 2  | minio         | 9000  | file_store     | Alive   |
| {'password': 'infini_rag_flow', 'retrieval_type': 'elasticsearch', 'username': 'elastic'} | localhost | 3  | elasticsearch | 1200  | retrieval      | Alive   |
| {'db_name': 'default_db', 'retrieval_type': 'infinity'}                                   | localhost | 4  | infinity      | 23817 | retrieval      | Timeout |
| {'database': 1, 'mq_type': 'redis', 'password': 'infini_rag_flow'}                        | localhost | 5  | redis         | 6379  | message_queue  | Alive   |
+-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+

List the operational status of RAGFlow backend services and all associated middleware.

SHOW SERVICE <id>;

Query the detailed operational status of the specified service.
<id>: Command: LIST SERVICES； The service identifier ID in the displayed results.

Usage Example:

Query the RAGFlow backend service:

admin> show service 0;
Showing service: 0
Service ragflow_0 is alive. Detail:
Confirm elapsed: 4.1 ms.

The response indicates that the RAGFlow backend service is online, with a response time of 4.1 milliseconds.

Query the MySQL service:

admin> show service 1;
Showing service: 1
Service mysql is alive. Detail:
+---------+----------+------------------+------+------------------+------------------------+-------+-----------------+
| command | db       | host             | id   | info             | state                  | time  | user            |
+---------+----------+------------------+------+------------------+------------------------+-------+-----------------+
| Daemon  | None     | localhost        | 5    | None             | Waiting on empty queue | 86356 | event_scheduler |
| Sleep   | rag_flow | 172.18.0.6:56788 | 8790 | None             |                        | 2     | root            |
| Sleep   | rag_flow | 172.18.0.6:53482 | 8791 | None             |                        | 73    | root            |
| Query   | rag_flow | 172.18.0.6:56794 | 8795 | SHOW PROCESSLIST | init                   | 0     | root            |
+---------+----------+------------------+------+------------------+------------------------+-------+-----------------+

The response indicates that the MySQL service is online, with the current connection and execution status shown in the table above.

Query the MinIO service:

admin> show service 2;
Showing service: 2
Service minio is alive. Detail:
Confirm elapsed: 2.3 ms.

The response indicates that the MinIO service is online, with a response time of 2.3 milliseconds.

Query the Elasticsearch service:

admin> show service 3;
Showing service: 3
Service elasticsearch is alive. Detail:
+----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+
| cluster_name   | docs | docs_deleted | indices | indices_shards | jvm_heap_max | jvm_heap_used | jvm_versions | mappings_deduplicated_fields | mappings_deduplicated_size | mappings_fields | nodes | nodes_version | os_mem  | os_mem_used | os_mem_used_percent | status | store_size | total_dataset_size |
+----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+
| docker-cluster | 8    | 0            | 1       | 2              | 3.76 GB      | 2.15 GB       | 21.0.1+12-29 | 17                           | 757 B                      | 17              | 1     | ['8.11.3']    | 7.52 GB | 2.30 GB     | 31                  | green  | 226 KB     | 226 KB             |
+----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+

The response indicates that the Elasticsearch cluster is operating normally, with specific metrics including document count, index status, memory usage, etc.

Query the Infinity service:

admin> show service 4;
Showing service: 4
Fail to show service, code: 500, message: Infinity is not in use.

The response indicates that Infinity is not currently in use by the RAGFlow system.

admin> show service 4;
Showing service: 4
Service infinity is alive. Detail:
+-------+--------+----------+
| error | status | type     |
+-------+--------+----------+
|       | green  | infinity |
+-------+--------+----------+

After enabling Infinity and querying again, the response indicates that the Infinity service is online and in good condition.

Query the Redis service:

admin> show service 5;
Showing service: 5
Service redis is alive. Detail:
+-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+
| blocked_clients | connected_clients | instantaneous_ops_per_sec | mem_fragmentation_ratio | redis_version | server_mode | total_commands_processed | total_system_memory | used_memory |
+-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+
| 0               | 3                 | 10                        | 3.03                    | 7.2.4         | standalone  | 404098                   | 30.84G              | 1.29M       |
+-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+

The response indicates that the Redis service is online, with the version number, deployment mode, and resource usage shown in the table above.

User Management Commands

LIST USERS;

List all users in the RAGFlow system.

Usage Example:

admin> list users;
Listing all users
+-------------------------------+----------------------+-----------+----------+
| create_date                   | email                | is_active | nickname |
+-------------------------------+----------------------+-----------+----------+
| Mon, 13 Oct 2025 15:58:42 GMT | admin@ragflow.io     | 1         | admin    |
| Mon, 13 Oct 2025 15:54:34 GMT | lynn_inf@hotmail.com | 1         | Lynn     |
+-------------------------------+----------------------+-----------+----------+

The response indicates that there are currently two users in the system, both of whom are enabled.

Among them, admin@ragflow.io is the administrator account, which is automatically created during the initial system startup.

SHOW USER <username>;

Query detailed user information by email.
<username>: The user's email address, which must be enclosed in single quotes ' or double quotes ".

Usage Example:

Query the administrator user

admin> show user "admin@ragflow.io";
Showing user: admin@ragflow.io
+-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+
| create_date                   | email            | is_active | is_anonymous | is_authenticated | is_superuser | language | last_login_time | login_channel | status | update_date                   |
+-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+
| Mon, 13 Oct 2025 15:58:42 GMT | admin@ragflow.io | 1         | 0            | 1                | True         | English  | None            | None          | 1      | Mon, 13 Oct 2025 15:58:42 GMT |
+-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+

The response indicates that admin@ragflow.io is a super administrator and is currently enabled.

Query a regular user

admin> show user "lynn_inf@hotmail.com";
Showing user: lynn_inf@hotmail.com
+-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+
| create_date                   | email                | is_active | is_anonymous | is_authenticated | is_superuser | language | last_login_time               | login_channel | status | update_date                   |
+-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+
| Mon, 13 Oct 2025 15:54:34 GMT | lynn_inf@hotmail.com | 1         | 0            | 1                | False        | English  | Mon, 13 Oct 2025 15:54:33 GMT | password      | 1      | Mon, 13 Oct 2025 17:24:09 GMT |
+-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+

The response indicates that lynn_inf@hotmail.com is a regular user who logs in via password, with the last login time shown as the provided timestamp.

CREATE USER <username> <password>;

Create a new user.
<username>: The user's email address must comply with standard email format specifications.
<password>: The user's password must not contain special characters such as \, ', or ".

Usage Example:

admin> create user "example@ragflow.io" "psw";
Create user: example@ragflow.io, password: psw, role: user
+----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+
| access_token                     | email              | id                               | is_superuser | login_channel | nickname |
+----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+
| be74d786a9b911f0a726d68c95a0776b | example@ragflow.io | be74d6b4a9b911f0a726d68c95a0776b | False        | password      |          |
+----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+

A regular user has been successfully created. Personal information such as nickname and avatar can be set by the user themselves after logging in and accessing the profile page.

ALTER USER PASSWORD <username> <new_password>;

Change the user's password.
<username>: User email address
<password>: New password (must not be the same as the old password and must not contain special characters)

Usage Example:

admin> alter user password "example@ragflow.io" "psw";
Alter user: example@ragflow.io, password: psw
Same password, no need to update!

When the new password is the same as the old password, the system prompts that no change is needed.

admin> alter user password "example@ragflow.io" "new psw";
Alter user: example@ragflow.io, password: new psw
Password updated successfully!

The password has been updated successfully. The user can log in with the new password thereafter.

ALTER USER ACTIVE <username> <on/off>;

Enable or disable a user.
<username>: User email address
<on/off>: Enabled or disabled status

Usage Example:

admin> alter user active "example@ragflow.io" off;
Alter user example@ragflow.io activate status, turn off.
Turn off user activate status successfully!

The user has been successfully disabled. Only users in a disabled state can be deleted.

DROP USER <username>;

Delete the user and all their associated data
<username>: User email address

Important Notes:

Only disabled users can be deleted.
Before proceeding, ensure that all necessary data such as knowledge bases and files that need to be retained have been transferred to other users.
This operation will permanently delete the following user data:

All knowledge bases created by the user, uploaded files, and configured agents, as well as files uploaded by the user in others' knowledge bases, will be permanently deleted. This operation is irreversible, so please proceed with extreme caution.

The deletion command is idempotent. If the system fails or the operation is interrupted during the deletion process, the command can be re-executed after troubleshooting to continue deleting the remaining data.

Usage Example:

User Successfully Deleted

admin> drop user "example@ragflow.io";
Drop user: example@ragflow.io
Successfully deleted user. Details:
Start to delete owned tenant.
- Deleted 2 tenant-LLM records.
- Deleted 0 langfuse records.
- Deleted 1 tenant.
- Deleted 1 user-tenant records.
- Deleted 1 user.
Delete done!

The response indicates that the user has been successfully deleted, and it lists detailed steps for data cleanup.

Deleting Super Administrator (Prohibited Operation)

admin> drop user "admin@ragflow.io";
Drop user: admin@ragflow.io
Fail to drop user, code: -1, message: Can't delete the super user.

The response indicates that the deletion failed. The super administrator account is protected and cannot be deleted, even if it is in a disabled state.

Data and Agent Management Commands

LIST DATASETS OF <username>;

List all knowledge bases of the specified user
<username>: User email address

Usage Example:

admin> list datasets of "lynn_inf@hotmail.com";
Listing all datasets of user: lynn_inf@hotmail.com
+-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+
| chunk_num | create_date                   | doc_num | language | name            | permission | status | token_num | update_date                   |
+-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+
| 8         | Mon, 13 Oct 2025 15:56:43 GMT | 1       | English  | primary_dataset | me         | 1      | 3296      | Mon, 13 Oct 2025 15:57:54 GMT |
+-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+

The response shows that the user has one private knowledge base, with detailed information such as the number of documents and segments displayed in the table above.

LIST AGENTS OF <username>;

List all Agents of the specified user
<username>: User email address

Usage Example:

admin> list agents of "lynn_inf@hotmail.com";
Listing all agents of user: lynn_inf@hotmail.com
+-----------------+-------------+------------+----------------+
| canvas_category | canvas_type | permission | title          |
+-----------------+-------------+------------+----------------+
| agent_canvas    | None        | me         | finance_helper |
+-----------------+-------------+------------+----------------+

The response indicates that the user has one private Agent, with detailed information shown in the table above.

Other commands

? or \help

Display help information.

\q or \quit

Follow-up plan

We are always committed to enhancing the system management experience and overall security. Building on its existing robust features, the RAGFlow back-end management tool will continue to evolve. In addition to the current efficient and flexible command-line interface, we are soon launching a professional system management UI, enabling administrators to perform all operational and maintenance tasks in a more secure and intuitive graphical environment.

To strengthen permission control, the system status information currently visible in the ordinary user interface will be revoked. After the future launch of the professional management UI, access to the core operational status of the system will be restricted to administrators only. This will effectively address the current issue of excessive permission exposure and further reinforce the system's security boundaries.

In addition, we will also roll out more fine-grained management features sequentially, including:

Fine-grained control over Datasets and Agents
User Team collaboration management mechanisms
Enhanced system monitoring and auditing capabilities

These improvements will establish a more comprehensive enterprise-level management ecosystem, providing administrators with a more all-encompassing and convenient system control experience.

500 Percent Faster Vector Retrieval! 90 Percent Memory Savings! Three Groundbreaking Technologies in Infinity v0.6.0 That Revolutionize HNSW

October 17, 2025 · 10 min read

Introduction

In RAG (Retrieval-Augmented Generation) and LLM (Large Language Model) Memory, vector retrieval is widely employed. Among various options, graph-based indexing has become the most common choice due to its high accuracy and performance, with the HNSW (Hierarchical Navigable Small World) index being the most representative one1 2.

However, during our practical application of HNSW in RAGFlow, we encountered the following two major bottlenecks:

As the data scale continues to grow, the memory consumption of vector data becomes highly significant. For instance, one billion 1024-dimensional floating-point vectors would require approximately 4TB of memory space.
When constructing an HNSW index on complex datasets, there is a bottleneck in retrieval accuracy. After reaching a certain threshold, it becomes difficult to further improve accuracy solely by adjusting parameters.

To address this, Infinity has implemented a variety of improved algorithms based on HNSW. Users can select different indexing schemes by adjusting the index parameters of HNSW. Each HNSW index variant possesses distinct characteristics and is suitable for different scenarios, allowing users to construct corresponding index structures based on their actual needs.

Introduction to Indexing Schemes

The original HNSW, as a commonly used graph-based index, exhibits excellent performance.

Its structure consists of two parts:

A set of original vector data, along with a graph structure jointly constructed by a skip list and an adjacency list. Taking the Python interface as an example, the index can be constructed and utilized in the following manner:

## Create index
table_obj.create_index(
    "hnsw_index",
    index.IndexInfo(
        "embedding",
        index.IndexType.Hnsw, {
            "m": 16,
            "ef_construction": 200,
            "metric": "l2"
        },
    )
)
## Vector retrieval
query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

To address the issues of high memory consumption and accuracy bottlenecks, Infinity provides the following solutions:

Introduce LVQ and RaBitQ quantization methods to reduce the memory overhead of original vectors during graph search processes.
Introduce the LSG strategy to optimize the graph index structure of HNSW, enhancing its accuracy threshold and query efficiency.

To facilitate users in testing the performance of different indexes locally, Infinity provides a benchmark script. You can follow the tutorial provided by Infinity on GitHub to set up the environment and prepare the dataset, and then test different indexing schemes using the benchmark.

######################   compile benchmark  ######################
cmake --build cmake-build-release --target hnsw_benchmark

###################### build index & execute query ######################
#           mode : build, query
# benchmark_type : sift, gist, msmarco
#     build_type : plain, lvq, crabitq, lsg, lvq_lsg, crabitq_lsg
##############################################################
benchmark_type=sift
build_type=plain
./cmake-build-release/benchmark/local_infinity/hnsw_benchmark --mode=build --benchmark_type=$benchmark_type --build_type=$build_type --thread_n=8
./cmake-build-release/benchmark/local_infinity/hnsw_benchmark --mode=query --benchmark_type=$benchmark_type --build_type=$build_type --thread_n=8 --topk=10

Among them, the original HNSW corresponds to the parameter build_type=plain. This paper conducts a unified test on the query performance of all variant indexes, with the experimental environment configuration adopted as follows:

OS: Ubuntu 24.04 LTS (Noble Numbat)
CPU: 13th Gen Intel(R) Core(TM) i5-13400
RAM: 64G

The CPU has a 16-core specification. To align with the actual device environments of most users, the parallel computing parameter in the benchmark is uniformly set to 8 threads.

Solution 1: Original HNSW + LVQ Quantizer (HnswLvq)

LVQ is a scalar quantization method that compresses each 32-bit floating-point number in the original vectors into an 8-bit integer3, thereby reducing memory usage to one-fourth of that of the original vectors.

Compared to simple scalar quantization methods (such as mean scalar quantization), LVQ reduces errors by statistically analyzing the residuals of each vector, effectively minimizing information loss in distance calculations for quantized vectors. Consequently, LVQ can accurately estimate the distances between original vectors with only approximately 30% of the original memory footprint.

In the HNSW algorithm, original vectors are utilized for distance calculations, which enables LVQ to be directly integrated with HNSW. We refer to this combined approach as HnswLvq. In Infinity, users can enable LVQ encoding by setting the parameter "encode": "lvq":

## Create index
table_obj.create_index(
    "hnsw_index",
    index.IndexInfo(
        "embedding",
        index.IndexType.Hnsw, {
            "m": 16,
            "ef_construction": 200,
            "metric": "l2",
            "encode": "lvq"
        },
    )
)
## Vector retrieval
query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

The graph structure of HnswLvq remains consistent with that of the original HNSW, with the key difference being that it uses quantized vectors to perform all distance calculations within the graph. Through this improvement, HnswLvq outperforms the original HNSW index in terms of both index construction and query efficiency.

The improvement in construction efficiency stems from the shorter length of quantized vectors, which results in reduced time for distance calculations using SIMD instructions. The enhancement in query efficiency is attributed to the computational acceleration achieved through quantization, which outweighs the negative impact caused by the loss of precision.

In summary, HnswLvq significantly reduces memory usage while maintaining excellent query performance. We recommend that users adopt it as the primary index in most scenarios. To replicate this experiment, users can set the parameter build_type=lvq in the benchmark. The specific experimental results are compared alongside the RaBitQ quantizer scheme in Solution two.

Solution 2: Original HNSW + RaBitQ Quantizer (HnswRabitq)

RaBitQ is a binary scalar quantization method that shares a similar core idea with LVQ, both aiming to replace the 32-bit floating-point numbers in original vectors with fewer encoded bits. The difference lies in that RaBitQ employs binary scalar quantization, representing each floating-point number with just 1 bit, thereby achieving an extremely high compression ratio.

However, this extreme compression also leads to more significant information loss in the vectors, resulting in a decline in the accuracy of distance estimation. To mitigate this issue, RaBitQ introduces a rotation matrix to preprocess the dataset during the preprocessing stage and retains more residual information, thereby reducing errors in distance calculations to a certain extent4.

Nevertheless, binary quantization has obvious limitations in terms of precision, showing a substantial gap compared to LVQ. Indexes built directly using RaBitQ encoding exhibit poor query performance.

Therefore, the HnswRabitq scheme implemented in Infinity involves first constructing an original HNSW index for the dataset and then converting it into an HnswRabitq index through the compress_to_rabitq parameter in the optimize method.

During the query process, the system initially uses quantized vectors for preliminary retrieval and then re-ranks the ef candidate results specified by the user using the original vectors.

## Create index
table_obj.create_index(
    "hnsw_index",
    index.IndexInfo(
        "embedding",
        index.IndexType.Hnsw, {
            "m": 16,
            "ef_construction": 200,
            "metric": "l2"
        },
    )
)
## Construct RaBitQ coding
table_obj.optimize("hnsw_index", {"compress_to_rabitq": "true"})
## Vector retrieval
query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

Compared to LVQ, RaBitQ can further reduce the memory footprint of encoded vectors by nearly 70%. On some datasets, the query efficiency of HnswRabitq even surpasses that of HnswLvq due to the higher efficiency of distance calculations after binary quantization.

However, it should be noted that on certain datasets (such as sift1M), the quantization process may lead to significant precision loss, making such datasets unsuitable for using HnswRabitq.

In summary, if a user's dataset is not sensitive to quantization errors, adopting the HnswRabitq index can significantly reduce memory overhead while still maintaining relatively good query performance.

In such scenarios, it is recommended to prioritize the use of the HnswRabitq index. Users can replicate the aforementioned experiments by setting the benchmark parameter build_type=crabitq.

Solution 3: LSG Graph Construction Strategy

LSG (Local Scaling Graph) is an improved graph construction strategy based on graph indexing algorithms (such as HNSW, DiskANN, etc.) 5.

This strategy scales the distance (e.g., L2 distance, inner product distance, etc.) between any two vectors by statistically analyzing the local information—neighborhood radius—of each vector in the dataset. The scaled distance is referred to as the LS distance.

During the graph indexing construction process, LSG uniformly replaces the original distance metric with the LS distance, effectively performing a "local scaling" of the original metric space. Through theoretical proofs and experiments, the paper demonstrates that constructing a graph index in this scaled space can achieve superior query performance in the original space.

LSG optimizes the HNSW index in multiple ways. When the accuracy requirement is relatively lenient (< 99%), LSG exhibits higher QPS (Queries Per Second) compared to the original HNSW index.

In high-precision scenarios (> 99%), LSG enhances the quality of the graph index, enabling HNSW to surpass its original accuracy limit and achieve retrieval accuracy that is difficult for the original HNSW index to attain. These improvements translate into faster response times and more precise query results for users in real-world applications of RAGFlow.

In Infinity, LSG is provided as an optional parameter for HNSW. Users can enable this graph construction strategy by setting build_type=lsg, and we refer to the corresponding index as HnswLsg.

## Create index
table_obj.create_index(
    "hnsw_index",
    index.IndexInfo(
        "embedding",
        index.IndexType.Hnsw, {
            "m": 16,
            "ef_construction": 200,
            "metric": "l2",
            "build_type": "lsg"
        },
    )
)
## Vector retrieval
query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

LSG essentially alters the metric space during the index construction process. Therefore, it can not only be applied to the original HNSW but also be combined with quantization methods (such as LVQ or RaBitQ) to form variant indexes like HnswLvqLsg or HnswRabitqLsg. The usage of the user interface remains consistent with that of HnswLvq and HnswRabitq.

LSG can enhance the performance of the vast majority of graph indexes and datasets, but at the cost of additional computation of local information—neighborhood radius—during the graph construction phase, which thus increases the construction time to a certain extent. For example, on the sift1M dataset, the construction time of HnswLsg is approximately 1.2 times that of the original HNSW.

In summary, if users are not sensitive to index construction time, they can confidently enable the LSG option, as it can steadily improve query performance. Users can replicate the aforementioned experiments by setting the benchmark parameter to build_type=[lsg/lvq_lsg/crabitq_lsg].

Index Performance Evaluation

To evaluate the performance of various indexes in Infinity, we selected three representative datasets as benchmarks, including the widely used sift and gist datasets in vector index evaluations.

Given that Infinity is frequently used in conjunction with RAGFlow in current scenarios, the retrieval effectiveness on RAG-type datasets is particularly crucial for users assessing index performance.

Therefore, we also incorporated the msmarco dataset. This dataset was generated by encoding the TREC-RAG 2024 corpus using the Cohere Embed English v3 model, comprising embedded vectors for 113.5 million text passages, as well as embedded vectors corresponding to 1,677 query instructions from TREC-Deep Learning 2021-2023.

From the test results of each dataset, it can be observed that in most cases, HnswRabitqLsg achieves the best overall performance. For instance, on the msmarco dataset in RAG scenarios, RaBitQ achieves a 90% reduction in memory usage while delivering query performance that is 5 times that of the original HNSW at a 99% recall rate.

Based on the above experimental results, we offer the following practical recommendations for Infinity users:

The original HNSW can attain a higher accuracy ceiling compared to HnswLvq and HnswRabitq. If users have extremely high accuracy requirements, this strategy should be prioritized.
Within the allowable accuracy range, HnswLvq can be confidently selected for most datasets. For datasets that are less susceptible to quantization effects, HnswRabitq is generally a better choice.
The LSG strategy enhances performance across all index variants. If users are not sensitive to index construction time, it is recommended to enable this option in all scenarios to improve query efficiency. Additionally, due to its algorithmic characteristics, LSG can significantly raise the accuracy ceiling. Therefore, if the usage scenario demands extremely high accuracy (>99.9%), enabling LSG is strongly recommended to optimize index performance.

Infinity continues to iterate and improve. We welcome ongoing attention and valuable feedback and suggestions from everyone.

RAGFlow 0.21.0 - Ingestion Pipeline, Long-Context RAG, and Admin CLI

October 15, 2025 · 10 min read

RAGFlow 0.21.0 officially released

This release shifts focus from enhancing online Agent capabilities to strengthening the data foundation, prioritising usability and dialogue quality from the ground up. Directly addressing common RAG pain points—from data preparation to long-document understanding—version 0.21.0 brings crucial upgrades: a flexible, orchestratable Ingestion Pipeline, long-context RAG to close semantic gaps in complex files, and a new admin CLI for smoother operations. Taken together, these elements establish RAGFlow’s refreshed data-pipeline core, providing a more solid foundation for building robust and effective RAG applications.

Orchestratable Ingestion Pipeline

If earlier Agents primarily tackled the orchestration of online data—as seen in Workflow and Agentic Workflow—the Ingestion Pipeline mirrors this capability by applying the same technical architecture to orchestrate offline data ingestion. Its introduction enables users to construct highly customized RAG data pipelines within a unified framework. This not only streamlines bespoke development but also more fully embodies the "Flow" in RAGFlow.

A typical RAG ingestion process involves key stages such as document parsing, text chunking, vectorization, and index building. When RAGFlow first launched in April 2024, it already incorporated an advanced toolchain, including the DeepDoc-based parsing engine and a templated chunking mechanism. These state-of-the-art solutions were foundational to its early adoption.

However, with rapid industry evolution and deeper practical application, we have observed new trends and demands:

The rise of Vision Language Models (VLMs): Increasingly mature VLMs have driven a wave of fine-tuned document parsing models. These offer significantly improved accuracy for unstructured documents with complex layouts or mixed text and images.
Demand for flexible chunking: Users now seek more customized chunking strategies. Faced with diverse knowledge-base scenarios, RAGFlow's original built-in chunking templates have proved insufficient for covering all niche cases, which can impact the accuracy of final Q&A outcomes.

To this end, RAGFlow 0.21.0 formally introduces the Ingestion Pipeline, featuring core capabilities including:

Orchestratable Data Ingestion: Building on the underlying Agent framework, users can create varied data ingestion pipelines. Each pipeline may apply different strategies to connect a data source to the final index, turning the previous built-in data-writing process into a user-customizable workflow. This provides more flexible ingestion aligned with specific business logic.
Decoupling of Upload and Cleansing: The architecture separates data upload from cleansing, establishing standard interfaces for future batch data sources and a solid foundation for expanding data preprocessing workflows.
Refactored Parser: The Parser component has been redesigned for extensibility, laying groundwork for integrating advanced document-parsing models beyond DeepDoc.
Customizable Chunking Interface: By decoupling the chunking step, users can plug in custom chunkers to better suit the segmentation needs of different knowledge structures.
Optimized Efficiency for Complex RAG: The execution of IO/compute-intensive tasks, such as GraphRAG and RAPTOR, has been overhauled. In the pre-pipeline architecture, processing each new document triggered a full compute cycle, resulting in slow performance. The new pipeline enables batch execution, significantly improving data throughput and overall efficiency.

If ETL/ELT represents the standard pipeline for processing structured data in the modern data stack—with tools like dbt and Fivetran providing unified and flexible data integration solutions for data warehouses and data lakes—then RAGFlow's Ingestion Pipeline is positioned to become the equivalent infrastructure for unstructured data. The following diagram illustrates this architectural analogy:

Specifically, while the Extract phase in ETL/ELT is responsible for pulling data from diverse sources, the RAGFlow Ingestion Pipeline augments this with a dedicated Parsing stage to extract information from unstructured data. This stage integrates multiple parsing models, led by DeepDoc, to convert multimodal documents (for example, text and images) into a unimodal representation suitable for processing.

In the Transform phase, where traditional ETL/ELT focuses on data cleansing and business logic, RAGFlow instead constructs a series of LLM-centric Agent components. These are optimized to address semantic gaps in retrieval, with a core mission that can be summarized as: to enhance recall and ranking accuracy.

For data loading, ETL/ELT writes results to a data warehouse or data lake, while RAGFlow uses an Indexer component to build the processed content into a retrieval-optimised index format. This reflects the RAG engine’s hybrid retrieval architecture, which must support full-text, vector, and future tensor-based retrieval to ensure optimal recall.

Thus, the modern data stack serves business analytics for structured data, whereas a RAG engine with an Ingestion Pipeline specializes in the intelligent retrieval of unstructured data—providing high-quality context for LLMs. Each occupies an equivalent ecological niche in its domain.

Regarding processing structured data, this is not the RAG engine’s core duty. It is handled by a Context Layer built atop the engine. This layer leverages the MCP (Model Context Protocol)—described as “TCP/IP for the AI era” —and accompanying Context Engineering to automate the population of all context types. This is a key focus area for RAGFlow’s next development phase.

Below is a preliminary look at the Ingestion Pipeline in v0.21.0; a more detailed guide will follow. We have introduced components for parsing, chunking, and other unstructured data processing tasks into the Agent Canvas, enabling users to freely orchestrate their parsing workflows.

Orchestrating an Ingestion Pipeline automates the process of parsing files and chunking them by length. It then leverages a large language model to generate summaries, keywords, questions, and even metadata. Previously, this metadata had to be entered manually. Now, a single configuration dramatically reduces maintenance overhead.

Furthermore, the pipeline process is fully observable, recording and displaying complete processing logs for each file.

The implementation of the Ingestion Pipeline in version 0.21.0 is a foundational step. In the next release, we plan to significantly enhance it by:

Adding support for more data sources.
Providing a wider selection of Parsers.
Introducing more flexible Transformer components to facilitate orchestration of a richer set of semantic-enhancement templates.

Long-context RAG

As we enter 2025, Retrieval-Augmented Generation (RAG) faces notable challenges driven by two main factors.

Fundamental limitations of traditional RAG

Traditional RAG architectures often fail to guarantee strong dialogue performance because they rely on a retrieval mechanism built around text chunks as the primary unit. This makes them highly sensitive to chunk quality and can yield degraded results due to insufficient context. For example:

If a coherent semantic unit is split across chunks, retrieval can be incomplete.
If a chunk lacks global context, the information presented to the LLM is weakened.

While strategies such as automatically detecting section headers and attaching them to chunks can help with global semantics, they are constrained by header-identification accuracy and the header’s own completeness.

Cost-efficiency concerns with advanced pre-processing techniques

Modern pre-processing methods—GraphRAG, RAPTOR, and Context Retrieval—aim to inject additional semantic information into raw data to boost search hit rates and accuracy for complex queries. They, however, share issues of high cost and unpredictable effectiveness.

GraphRAG: This approach often consumes many times more tokens than the original text, and the automatically generated knowledge graphs are frequently unsatisfactory. Its effectiveness in complex multi-hop reasoning is limited by uncontrollable reasoning paths. As a supplementary retrieval outside the original chunks, the knowledge graph also loses some granular context from the source.
RAPTOR: This technique produces clustered summaries that are recalled as independent chunks but naturally lack the detail of the source text, reintroducing the problem of insufficient context.

Context Retrieval: This method enriches original chunks with extra semantics such as keywords or potential questions. It presents a clear trade-off:

The more effective option queries the LLM multiple times per chunk, using both full text and the current chunk for context, improving performance but driving token costs several times higher than the original text.
The cheaper option generates semantic information based only on the current chunk, saving costs but providing limited global context and modest performance gains. The last few years have seen the emergence of new RAG schemes.
Complete abandonment of retrieval: some approaches have the LLM read documents directly, splitting them into chunks according to the context window and performing multi-stage searches. First, the LLM decides which global document is relevant, then which chunks, and finally loads those chunks to answer. While this avoids recall inaccuracies, it harms response latency, concurrency, and large-scale data handling, making practical deployment difficult.
Abandoning embedding or indexing in favour of tools like grep: this evolves RAG into Agentic RAG. As applications grow more complex and user queries diversify, combining RAG with agents is increasingly inevitable, since only LLMs can translate raw inquiries into structured retrieval commands. In RAGFlow, this capability has long been realized. Abandoning indexing to use grep is a compromise for simplifying agent development in personal or small-scale contexts; in enterprise settings, a powerful retrieval engine remains essential.
Long-Context RAG: introduced in version 0.21.0 as part of the same family as GraphRAG, RAPTOR and Context Retrieval, this approach uses LLMs to enrich raw text semantics to boost recall while retaining indexing and search. Retrieval remains central. Long-context RAG mirrors how people consult information: identify relevant chapters via the table of contents, then locate exact pages for detail. During indexing, the LLM extracts and attaches chapter information to each chunk to provide global context; during retrieval, it finds matching chunks and uses the table-of-contents structure to fill in gaps from chunk fragmentation.
Current experience and future direction: users can try Long-Context RAG via the “TOC extraction” (Table of Contents) feature, though it is in beta. The next release will add an Ingestion Pipeline. A key path to improving RAG lies in using LLMs to enrich content semantics without discarding retrieval altogether. Consequently, a flexible pipeline that lets users assemble LLM-based content-transformation components is an important direction for enhancing RAG retrieval quality.

Backend management CLI

RAGFlow’s progression has shifted from core module development to strengthening administrative and operational capabilities.

In earlier versions, while parsing and retrieval-augmented generation improved, system administration lagged. Administrators could not modify passwords or delete accounts, complicating deployment and maintenance.
With RAGFlow 0.21.0, fundamental system management is markedly improved. A new command-line administration tool provides a central, convenient interface for administrators. Core capabilities include:
- Service lifecycle management: monitoring built-in RAGFlow services for greater operational flexibility.
- Comprehensive user management:
  - Create new registered users.
  - Directly modify login passwords.
  - Delete user accounts.
  - Enable or disable accounts.
  - View details of all registered users.
- Resource overview: listing knowledge bases and Agents created under registered users for system-wide monitoring.

This upgrade underlines RAGFlow’s commitment to robust functionality and foundational administrative strength essential for enterprise use. Looking ahead, the team plans an enterprise-grade web administration panel and accompanying user interface to streamline management, boost efficiency, and enhance the end-user experience, supporting greater maturity and stability.

Finale

RAGFlow 0.21.0 marks a significant milestone, building on prior progress and outlining future developments. It introduces the first integration of Retrieval (RAG) with orchestration (Flow), forming an intelligent engine to support the LLM context layer, underpinned by unstructured data ELT and a robust RAG capability set.

From the user-empowered Ingestion Pipeline to long-context RAG that mitigates semantic fragmentation, and the management backend that ensures reliable operation, every new feature is designed to make the RAG system smarter, more flexible, and enterprise-ready. This is not merely a feature tally but an architectural evolution, establishing a solid foundation for future growth.

Our ongoing focus remains the LLM context layer: building a powerful, reliable data foundation for LLMs and effectively serving all Agents. This remains RAGFlow’s core aim.

We invite you to continue following and starring our project as we grow together.

GitHub: https://github.com/infiniflow/ragflow

Tutorial - Build an E-Commerce Customer Support Agent Using RAGFlow

September 12, 2025 · 7 min read

Currently, e-commerce retail platforms extensively use intelligent customer service systems to manage a wide range of user enquiries. However, traditional intelligent customer service often struggles to meet users’ increasingly complex and varied needs. For example, customers may require detailed comparisons of functionalities between different product models before making a purchase; they might be unable to use certain features due to losing the instruction manual; or, in the case of home products, they may need to arrange an on-site installation appointment through customer service.

To address these challenges, we have identified several common demand scenarios, including queries about functional differences between product models, requests for usage assistance, and scheduling of on-site installation services. Building on the recently launched Agent framework of RAGFlow, this blog presents an approach for the automatic identification and branch-specific handling of user enquiries, achieved by integrating workflow orchestration with large language models.

The workflow is orchestrated as follows:

The following sections offer a detailed explanation of the implementation process for this solution.

1. Prepare datasets

1.1 Create datasets

You can download the sample datasets from Hugging Face Datasets.

Create the "Product Information" and "User Guide" knowledge bases and upload the relevant dataset documents.

1.2 Parse documents

For documents in the 'Product Information' and 'User Guide' knowledge bases, we choose to use Manual chunking.

Product manuals are often richly illustrated with a combination of text and images, containing extensive information and complex structures. Relying solely on text length for segmentation risks compromising the integrity of the content. RAGFlow assumes such documents follow a hierarchical structure and therefore uses the "smallest heading" as the basic unit of segmentation, ensuring each section of text and its accompanying graphics remain intact within a single chunk. A preview of the user manual following segmentation is shown below:

2. Build workflow

2.1 Create an app

Upon successful creation, the system will automatically generate a Begin component on the canvas.

In the Begin component, the opening greeting message for customer service can be configured, for example:

Hi! I'm your assistant.

2.2 Add a Categorize component

The Categorize component uses a Large Language Model (LLM) for intent recognition. It classifies user inputs and routes them to the appropriate processing workflows based on the category’s name, description, and provided examples.

2.3 Build a product feature comparison workflow

The Retrieval component connects to the "Product Information" knowledge base to fetch content relevant to the user’s query, which is then passed to the Agent component to generate a response.

Add a Retrieval component named "Feature Comparison Knowledge Base" and link it to the "Product Information" knowledge base.

Add an Agent component after the Retrieval component, name it "Feature Comparison Agent," and configure the System Prompt as follows:

## Role
You are a product specification comparison assistant.
## Goal
Help the user compare two or more products based on their features and specifications. Provide clear, accurate, and concise comparisons to assist the user in making an informed decision.
---
## Instructions
- Start by confirming the product models or options the user wants to compare.
- If the user has not specified the models, politely ask for them.
- Present the comparison in a structured way (e.g., bullet points or a table format if supported).
- Highlight key differences such as size, capacity, performance, energy efficiency, and price if available.
- Maintain a neutral and professional tone without suggesting unnecessary upselling.
---

Configure User prompt

User's query is /(Begin Input) sys.query 

Schema is /(Feature Comparison Knowledge Base) formalized_content

After configuring the Agent component, the result is as follows:

2.4 Build a product user guide workflow

The Retrieval component queries the "User Guide" knowledge base for content relevant to the user’s question, then passes the results to the Agent component to formulate a response.

Add a Retrieval component named "Usage Guide Knowledge Base" and link it to the "User Guide" knowledge base.

Add an Agent component after the Retrieval component, name it "Usage Guide Agent," and configure its System Prompt as follows:

## Role
You are a product usage guide assistant.
## Goal
Provide clear, step-by-step instructions to help the user set up, operate, and maintain their product. Answer questions about functions, settings, and troubleshooting.
---
## Instructions
- If the user asks about setup, provide easy-to-follow installation or configuration steps.
- If the user asks about a feature, explain its purpose and how to activate it.
- For troubleshooting, suggest common solutions first, then guide through advanced checks if needed.
- Keep the response simple, clear, and actionable for a non-technical user.
---

Write user prompt

User's query is /(Begin Input) sys.query 

Schema is / (Usage Guide Knowledge Base) formalized_content

After configuring the Agent component, the result is as follows:

2.5 Build an installation booking assistant

The Agent engages in a multi-turn dialogue with the user to collect three key pieces of information: contact number, installation time, and installation address. Create an Agent component named "Installation Booking Agent" and configure its System Prompt as follows:

# Role
You are an Installation Booking Assistant.
## Goal
Collect the following three pieces of information from the user 
1. Contact Number  
2. Preferred Installation Time  
3. Installation Address  
Once all three are collected, confirm the information and inform the user that a technician will contact them later by phone.
## Instructions
1. **Check if all three details** (Contact Number, Preferred Installation Time, Installation Address) have been provided.
2. **If some details are missing**, acknowledge the ones provided and only ask for the missing information.
3. Do **not repeat** the full request once some details are already known.
4. Once all three details are collected, summarize and confirm them with the user.

Write user prompt

User's query is /(Begin Input) sys.query

After configuring the Agent component, the result is as follows:

If user information needs to be registered, an HTTP Request component can be connected after the Agent component to transmit the data to platforms such as Google Sheets or Notion. Developers may implement this according to their specific requirements; this blog article does not cover implementation details.

2.6 Add a reply message component

For these three workflows, a single Message component is used to receive the output from the Agent components, which then displays the processed results to the user.

2.7 Save and test

Click Save → Run → View Execution Result. When inquiring about product models and features, the system correctly returns a comparison:

When asked about usage instructions, the system provides accurate guidance:

When scheduling an installation, the system collects and confirms all necessary information:

Summary

This use case can also be implemented using an Agent-based workflow, which offers the advantage of flexibly handling complex problems. However, since Agents actively engage in planning and reflection, they often significantly increase response times, leading to a diminished customer experience. As such, this approach is not well suited to scenarios like e-commerce after-sales customer service, where high responsiveness and relatively straightforward tasks are required. For applications involving complex issues, we have previously shared the Deep Research multi-agent framework. Related templates are available in our template library.

The customer service workflow presented in this article is designed for e-commerce, yet this domain offers many more scenarios suitable for workflow automation—such as user review analysis and personalized email campaigns—which have not been covered here. By following the practical guidelines provided, you can also easily adapt this approach to other customer service contexts. We encourage you to build such applications using RAGFlow. Reinventing customer service with large language models moves support beyond “mechanical responses,” elevating capabilities from mere “retrieval and matching” to “cognitive reasoning.” Through deep understanding and real-time knowledge generation, it delivers an unprecedented experience that truly “understands human language,” thereby redefining the upper limits of intelligent service and transforming support into a core value engine for businesses.

Background​

0.22.1 capabilities​

Datasets containing parsed data are allowed to switch embedding models​

Why use a 0.9 threshold?​

How to switch embedding model​

1. Create a Google Cloud Project​

2. Configure OAuth Consent Screen​

3. Create OAuth Client Credentials​

If using Docker deployment:​

If running from source:​

4. Add Scopes​

5. Enable Required APIs​

6. Add Google Drive As a Data Source in RAGFlow​

0.22 Highlights​

Support for Rich External Data Sources​

Example: S3 Configuration​

Linking Data Sources to a dataset​

Enhanced Parser​

MinerU​

Docling​

Agent Optimizations​

Retrieval Now Uses Metadata​

Agent Teamwork Gets Better​

Admin UI​

See System Status Instantly​

Manage Users Easily​

Finale​

1. Preparing the Dataset

1.1 Create a dataset​

1.2 Parse documents​

2. Building the Intelligent Agent

2.1 Create an application.​

2.2 Build the function of "Extract Stock Codes"​

2.2.1 Agent extracts stock codes​

2.2.2 Conditional node for identifying stock codes​

2.3 Build the "Company Financial Statements" feature​

2.3.1 Yahoo Finance Tools: Request for Financial Data​

2.3.2 Financial table generation by Code node​

2.4 Build the "Research Report Information Extraction" function​

2.4.1 Configure the MCP tool​

2.4.2 Internal Research Report Retrieval Agent​

2.5 Add a Research Report Generation Agent​

2.6 Add a Reply Message Node​

2.7 Save and Test​

Summary and Outlook

The Rise of Retrieval-Augmented Generation in Production​

Why RAGFlow Resonates in the AI Era​

A Project in Active Development​

Parser​

Chunker​

Transformer​

Indexer​

Link the Ingestion Pipeline to the Knowledge Base

Logs​

Case Reference

Summary

Start the service​

Install the client and connect to the management service​

Command Usage Guide​

Service Management Commands​

User Management Commands​

Data and Agent Management Commands​

Other commands​

Follow-up plan​

Introduction​

Introduction to Indexing Schemes​

Solution 1: Original HNSW + LVQ Quantizer (HnswLvq)​

Solution 2: Original HNSW + RaBitQ Quantizer (HnswRabitq)​

Solution 3: LSG Graph Construction Strategy​

Index Performance Evaluation​

RAGFlow 0.21.0 officially released​

Orchestratable Ingestion Pipeline​

Long-context RAG​

Backend management CLI​

Finale​

1. Prepare datasets​

1.1 Create datasets​

1.2 Parse documents​

2. Build workflow​

2.1 Create an app​

Background

0.22.1 capabilities

Datasets containing parsed data are allowed to switch embedding models

Why use a 0.9 threshold?

How to switch embedding model

1. Create a Google Cloud Project

2. Configure OAuth Consent Screen

3. Create OAuth Client Credentials

If using Docker deployment:

If running from source:

4. Add Scopes

5. Enable Required APIs

6. Add Google Drive As a Data Source in RAGFlow

0.22 Highlights

Support for Rich External Data Sources

Example: S3 Configuration

Linking Data Sources to a dataset

Enhanced Parser

MinerU

Docling

Agent Optimizations

Retrieval Now Uses Metadata

Agent Teamwork Gets Better

Admin UI

See System Status Instantly

Manage Users Easily

Finale

1.1 Create a dataset

1.2 Parse documents

2.1 Create an application.

2.2 Build the function of "Extract Stock Codes"

2.2.1 Agent extracts stock codes

2.2.2 Conditional node for identifying stock codes

2.3 Build the "Company Financial Statements" feature

2.3.1 Yahoo Finance Tools: Request for Financial Data

2.3.2 Financial table generation by Code node

2.4 Build the "Research Report Information Extraction" function

2.4.1 Configure the MCP tool

2.4.2 Internal Research Report Retrieval Agent

2.5 Add a Research Report Generation Agent

2.6 Add a Reply Message Node

2.7 Save and Test

The Rise of Retrieval-Augmented Generation in Production

Why RAGFlow Resonates in the AI Era

A Project in Active Development

Parser

Chunker

Transformer

Indexer

Logs

Start the service

Install the client and connect to the management service

Command Usage Guide

Service Management Commands

User Management Commands

Data and Agent Management Commands

Other commands

Follow-up plan

Introduction

Introduction to Indexing Schemes

Solution 1: Original HNSW + LVQ Quantizer (HnswLvq)

Solution 2: Original HNSW + RaBitQ Quantizer (HnswRabitq)

Solution 3: LSG Graph Construction Strategy

Index Performance Evaluation

RAGFlow 0.21.0 officially released

Orchestratable Ingestion Pipeline

Long-context RAG

Backend management CLI

Finale

1. Prepare datasets

1.1 Create datasets

1.2 Parse documents

2. Build workflow

2.1 Create an app

2.2 Add a Categorize component

2.3 Build a product feature comparison workflow

2.4 Build a product user guide workflow

2.5 Build an installation booking assistant

2.6 Add a reply message component

2.7 Save and test