Start AI chat
Initiate an AI-powered chat with a configured chat assistant.
Knowledge base, hallucination-free chat, and file management are the three pillars of RAGFlow. Chats in RAGFlow are based on a particular knowledge base or multiple knowledge bases. Once you have created your knowledge base, finished file parsing, and run a retrieval test, you can go ahead and start an AI conversation.
Start an AI chat
You start an AI conversation by creating an assistant.
-
Click the Chat tab in the middle top of the page > Create an assistant to show the Chat Configuration dialogue of your next dialogue.
RAGFlow offers you the flexibility of choosing a different chat model for each dialogue, while allowing you to set the default models in System Model Settings.
-
Update Assistant Setting:
- Assistant name is the name of your chat assistant. Each assistant corresponds to a dialogue with a unique combination of knowledge bases, prompts, hybrid search configurations, and large model settings.
- Empty response:
- If you wish to confine RAGFlow's answers to your knowledge bases, leave a response here. Then, when it doesn't retrieve an answer, it uniformly responds with what you set here.
- If you wish RAGFlow to improvise when it doesn't retrieve an answer from your knowledge bases, leave it blank, which may give rise to hallucinations.
- Show quote: This is a key feature of RAGFlow and enabled by default. RAGFlow does not work like a black box. Instead, it clearly shows the sources of information that its responses are based on.
- Select the corresponding knowledge bases. You can select one or multiple knowledge bases, but ensure that they use the same embedding model, otherwise an error would occur.
-
Update Prompt Engine:
- In System, you fill in the prompts for your LLM, you can also leave the default prompt as-is for the beginning.
- Similarity threshold sets the similarity "bar" for each chunk of text. The default is 0.2. Text chunks with lower similarity scores are filtered out of the final response.
- Keyword similarity weight is set to 0.7 by default. RAGFlow uses a hybrid score system to evaluate the relevance of different text chunks. This value sets the weight assigned to the keyword similarity component in the hybrid score.
- If Rerank model is left empty, the hybrid score system uses keyword similarity and vector similarity, and the default weight assigned to the vector similarity component is 1-0.7=0.3.
- If Rerank model is selected, the hybrid score system uses keyword similarity and reranker score, and the default weight assigned to the reranker score is 1-0.7=0.3.
- Top N determines the maximum number of chunks to feed to the LLM. In other words, even if more chunks are retrieved, only the top N chunks are provided as input.
- Multi-turn optimization enhances user queries using existing context in a multi-round conversation. It is enabled by default. When enabled, it will consume additional LLM tokens and significantly increase the time to generate answers.
- Rerank model sets the reranker model to use. It is left empty by default.
- If Rerank model is left empty, the hybrid score system uses keyword similarity and vector similarity, and the default weight assigned to the vector similarity component is 1-0.7=0.3.
- If Rerank model is selected, the hybrid score system uses keyword similarity and reranker score, and the default weight assigned to the reranker score is 1-0.7=0.3.
- Variable refers to the variables (keys) to be used in the system prompt.
{knowledge}
is a reserved variable. Click Add to add more variables for the system prompt.- If you are uncertain about the logic behind Variable, leave it as-is.
-
Update Model Setting:
- In Model: you select the chat model. Though you have selected the default chat model in System Model Settings, RAGFlow allows you to choose an alternative chat model for your dialogue.
- Preset configurations refers to the level that the LLM improvises. From Improvise, Precise, to Balance, each preset configuration corresponds to a unique combination of Temperature, Top P, Presence penalty, and Frequency penalty.
- Temperature: Level of the prediction randomness of the LLM. The higher the value, the more creative the LLM is.
- Top P is also known as "nucleus sampling". See here for more information.
- Max Tokens: The maximum length of the LLM's responses. Note that the responses may be curtailed if this value is set too low.
-
Now, let's start the show:
- Click the light bulb icon above the answer to view the expanded system prompt:
The light bulb icon is available only for the current dialogue.
- Scroll down the expanded prompt to view the time consumed for each task:
Update settings of an existing chat assistant
Hover over an intended chat assistant > Edit to show the chat configuration dialogue:
Integrate chat capabilities into your application or webpage
RAGFlow offers HTTP and Python APIs for you to integrate RAGFlow's capabilities into your applications. Read the following documents for more information:
You can use iframe to embed the created chat assistant into a third-party webpage:
-
Before proceeding, you must acquire an API key; otherwise, an error message would appear.
-
Hover over an intended chat assistant > Edit to show the iframe window:
-
Copy the iframe and embed it into a specific location on your webpage.