Version: v0.17.1

Start AI chat

Initiate an AI-powered chat with a configured chat assistant.

Knowledge base, hallucination-free chat, and file management are the three pillars of RAGFlow. Chats in RAGFlow are based on a particular knowledge base or multiple knowledge bases. Once you have created your knowledge base, finished file parsing, and run a retrieval test, you can go ahead and start an AI conversation.

Start an AI chat

You start an AI conversation by creating an assistant.

Click the Chat tab in the middle top of the page > Create an assistant to show the Chat Configuration dialogue of your next dialogue.

RAGFlow offers you the flexibility of choosing a different chat model for each dialogue, while allowing you to set the default models in System Model Settings.
Update Assistant Setting:
- Assistant name is the name of your chat assistant. Each assistant corresponds to a dialogue with a unique combination of knowledge bases, prompts, hybrid search configurations, and large model settings.
- Empty response:
  - If you wish to confine RAGFlow's answers to your knowledge bases, leave a response here. Then, when it doesn't retrieve an answer, it uniformly responds with what you set here.
  - If you wish RAGFlow to improvise when it doesn't retrieve an answer from your knowledge bases, leave it blank, which may give rise to hallucinations.
- Show quote: This is a key feature of RAGFlow and enabled by default. RAGFlow does not work like a black box. Instead, it clearly shows the sources of information that its responses are based on.
- Select the corresponding knowledge bases. You can select one or multiple knowledge bases, but ensure that they use the same embedding model, otherwise an error would occur.
Update Prompt Engine:
- In System, you fill in the prompts for your LLM, you can also leave the default prompt as-is for the beginning.
- Similarity threshold sets the similarity "bar" for each chunk of text. The default is 0.2. Text chunks with lower similarity scores are filtered out of the final response.
- Keyword similarity weight is set to 0.7 by default. RAGFlow uses a hybrid score system to evaluate the relevance of different text chunks. This value sets the weight assigned to the keyword similarity component in the hybrid score.
  - If Rerank model is left empty, the hybrid score system uses keyword similarity and vector similarity, and the default weight assigned to the vector similarity component is 1-0.7=0.3.
  - If Rerank model is selected, the hybrid score system uses keyword similarity and reranker score, and the default weight assigned to the reranker score is 1-0.7=0.3.
- Top N determines the maximum number of chunks to feed to the LLM. In other words, even if more chunks are retrieved, only the top N chunks are provided as input.
- Multi-turn optimization enhances user queries using existing context in a multi-round conversation. It is enabled by default. When enabled, it will consume additional LLM tokens and significantly increase the time to generate answers.
- Rerank model sets the reranker model to use. It is left empty by default.
  - If Rerank model is left empty, the hybrid score system uses keyword similarity and vector similarity, and the default weight assigned to the vector similarity component is 1-0.7=0.3.
  - If Rerank model is selected, the hybrid score system uses keyword similarity and reranker score, and the default weight assigned to the reranker score is 1-0.7=0.3.
- Variable refers to the variables (keys) to be used in the system prompt. {knowledge} is a reserved variable. Click Add to add more variables for the system prompt.
  - If you are uncertain about the logic behind Variable, leave it as-is.
Update Model Setting:
- In Model: you select the chat model. Though you have selected the default chat model in System Model Settings, RAGFlow allows you to choose an alternative chat model for your dialogue.
- Preset configurations refers to the level that the LLM improvises. From Improvise, Precise, to Balance, each preset configuration corresponds to a unique combination of Temperature, Top P, Presence penalty, and Frequency penalty.
- Temperature: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is.
- Top P is also known as "nucleus sampling". See here for more information.
- Max Tokens: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low.
Now, let's start the show:

NOTE

Click the light bulb icon above the answer to view the expanded system prompt:

The light bulb icon is available only for the current dialogue.

Scroll down the expanded prompt to view the time consumed for each task:

enlighten

Update settings of an existing chat assistant

Hover over an intended chat assistant > Edit to show the chat configuration dialogue:

edit_chat

chat_config

Integrate chat capabilities into your application or webpage

RAGFlow offers HTTP and Python APIs for you to integrate RAGFlow's capabilities into your applications. Read the following documents for more information:

You can use iframe to embed the created chat assistant into a third-party webpage:

Before proceeding, you must acquire an API key; otherwise, an error message would appear.
Hover over an intended chat assistant > Edit to show the iframe window:
Copy the iframe and embed it into a specific location on your webpage.

Start AI chat

Start an AI chat​

Update settings of an existing chat assistant​

Integrate chat capabilities into your application or webpage​

Start an AI chat

Update settings of an existing chat assistant

Integrate chat capabilities into your application or webpage