Extract table of contents
Extract table of contents (TOC) from documents to provide long context RAG and improve retrieval.
During indexing, this technique uses LLM to extract and generate chapter information, which is added to each chunk to provide sufficient global context. At the retrieval stage, it first uses the chunks matched by search, then supplements missing chunks based on the table of contents structure. This addresses issues caused by chunk fragmentation and insufficient context, improving answer quality.
WARNING
Enabling TOC extraction requires significant memory, computational resources, and tokens.