Accelerate document indexing and question answering
A checklist to speed up document parsing and question answering.
Please note that some of your settings may consume a significant amount of time. If you often find that document parsing and question answering are time-consuming, here is a checklist to consider:
1. Accelerate document indexing
- Use GPU to reduce embedding time.
- On the configuration page of your knowledge base, toggle off Use RAPTOR to enhance retrieval.
- The Knowledge Graph chunk method (GraphRAG) is time-consuming.
- Disable Auto-keyword and Auto-question on the configuration page of yor knowledge base, as both depend on the LLM.
2. Accelerate question answering
- In the Prompt Engine tab of your Chat Configuration dialogue, disabling Multi-turn optimization will reduce the time required to get an answer from the LLM.
- In the Prompt Engine tab of your Chat Configuration dialogue, leaving the Rerank model field empty will significantly decrease retrieval time.
- In the Assistant Setting tab of your Chat Configuration dialogue, disabling Keyword analysis will reduce the time to get get an answer from the LLM.
- When chatting with your chat assistant, click the light bulb icon above the current dialogue and scroll down the popup window to view the time taken for each task:
Item name | Description |
---|---|
Total | Total time spent on this conversation round, including chunk retrieval and answer generation. |
Check LLM | Time to validate the specified LLM. |
Create retriever | Time to create a chunk retriever. |
Bind embedding | Time to initialize an embedding model instance. |
Bind LLM | Time to initialize an LLM instance. |
Tune question | Time to optimize the user query using the context of the mult-turn conversation. |
Bind reranker | Time to initialize an reranker model instance for chunk retrieval. |
Generate keywords | Time to extract keywords from the user query. |
Retrieval | Time to retrieve the chunks. |
Generate answer | Time to generate the answer. |