Skip to main content
Version: DEV

Auto-keyword Auto-question

Use a chat model to generate keywords and questions from the original chunks.


When selecting a chunking method, you can also enable auto-keyword or auto-question generation to increase retrieval rates. This feature uses a chat model to produce a specified number of keywords and questions from each created chunk, creating a layer of higher-level information from the original content.

NOTE

Enabling this feature increases document indexing time, as all created chunks will be sent to the chat model for keyword or question generation.

  • Auto-keyword

    • Definition: The number of additional keywords the LLM generates for each chunk. By supplying synonyms for text that is unfriendly to tokenization or multilingual content, this improves recall for full-text or hybrid retrieval. It can also be used to correct bad cases. Disabling this can significantly accelerate parsing.
    • Common Values:
      • 0: Disabled;
      • 3-5 = Recommended (if a chunk has over a thousand characters, more keywords may be needed);
      • Maximum 30. Note that, as the number increases, the marginal benefit decreases.
  • Auto-question

    • Definition: Generates potential FAQ-style questions for each chunk, making retrieval matches more aligned with real user queries (Who/What/Why).
    • Common Values:
      • 0 = disabled;
      • 1–2 = commonly used (if a chunk has thousands of characters, more may be needed);
      • Upper limit 30 (to avoid generating too many at once). Can also be used to correct bad cases.
    • Typical Use Cases: Scenarios requiring FAQ retrieval, such as product manuals, policy documents, etc.

Configuration

On the Configuration page of your knowledge base, you will find the Auto-keyword and Auto-question sliders under Page rank.

NOTE

The Auto-keyword or Auto-question value must be an integer. If you set their value to a non-integer, say 1.7, it will be rounded down to the nearest integer, which in this case is 1.

Best practices

If you are uncertain how to set auto-keyword or auto-question values, here are some best practices gathered from our community:

Use cases or typical scenariosDocument volume/lengthAuto_keyword (0–30)Auto_question (0–30)
1. Internal Process Guidance for Employee HandbookSmall, under 10 pages00
2. Customer Service FAQ Hot QuestionsMedium, 10–100 pages3–71–3
3. Technical Whitepapers: Development Standards, Protocol ExplanationsLarge, over 100 pages2–41–2
5. Multi-repository Layered New Documents + Old ArchiveManyAdjust as appropriateAdjust as appropriate
6. Social Media Comment Pool: Multilingual & Mixed SpellingVery large volume of short text8–120
7. Operational Logs for DevOps TroubleshootingVery large volume of short text3–60
8. Marketing Asset Library: Multilingual Product DescriptionsMedium6–101–2
9. Training Courseware / eBooksLarge2–51–2
10. Maintenance Manual: Equipment Diagrams + StepsMedium3–71–2