Python API Reference
A complete reference for RAGFlow's Python APIs. Before proceeding, please ensure you have your RAGFlow API key ready for authentication.
Dataset Management
Create dataset
RAGFlow.create_dataset(
name: str,
avatar: str = "",
description: str = "",
embedding_model: str = "BAAI/bge-zh-v1.5",
language: str = "English",
permission: str = "me",
chunk_method: str = "naive",
parser_config: DataSet.ParserConfig = None
) -> DataSet
Creates a dataset.
Parameters
name: str
, Required
The unique name of the dataset to create. It must adhere to the following requirements:
- Permitted characters include:
- English letters (a-z, A-Z)
- Digits (0-9)
- "_" (underscore)
- Must begin with an English letter or underscore.
- Maximum 65,535 characters.
- Case-insensitive.
avatar: str
Base64 encoding of the avatar. Defaults to ""
description: str
A brief description of the dataset to create. Defaults to ""
.
language: str
The language setting of the dataset to create. Available options:
"English"
(default)"Chinese"
permission
Specifies who can access the dataset to create. Available options:
"me"
: (Default) Only you can manage the dataset."team"
: All team members can manage the dataset.
chunk_method, str
The chunking method of the dataset to create. Available options:
"naive"
: General (default)"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
parser_config
The parser configuration of the dataset. A ParserConfig
object's attributes vary based on the selected chunk_method
:
chunk_method
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"user_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"user_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"user_raptor": False}}
chunk_method
="book"
:
{"raptor": {"user_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"user_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="presentation"
:
{"raptor": {"user_raptor": False}}
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: A
dataset
object. - Failure:
Exception
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
Delete datasets
RAGFlow.delete_datasets(ids: list[str] = None)
Deletes datasets by ID.
Parameters
ids: list[str]
, Required
The IDs of the datasets to delete. Defaults to None
. If it is not specified, all datasets will be deleted.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object.delete_datasets(ids=["id_1","id_2"])
List datasets
RAGFlow.list_datasets(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[DataSet]
Lists datasets.
Parameters
page: int
Specifies the page on which the datasets will be displayed. Defaults to 1
.
page_size: int
The number of datasets on each page. Defaults to 30
.
orderby: str
The field by which datasets should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved datasets should be sorted in descending order. Defaults to True
.
id: str
The ID of the dataset to retrieve. Defaults to None
.
name: str
The name of the dataset to retrieve. Defaults to None
.
Returns
- Success: A list of
DataSet
objects. - Failure:
Exception
.
Examples
List all datasets
for dataset in rag_object.list_datasets():
print(dataset)
Retrieve a dataset by ID
dataset = rag_object.list_datasets(id = "id_1")
print(dataset[0])
Update dataset
DataSet.update(update_message: dict)
Updates configurations for the current dataset.
Parameters
update_message: dict[str, str|int]
, Required
A dictionary representing the attributes to update, with the following keys:
"name"
:str
The revised name of the dataset."embedding_model"
:str
The updated embedding model name.- Ensure that
"chunk_count"
is0
before updating"embedding_model"
.
- Ensure that
"chunk_method"
:str
The chunking method for the dataset. Available options:"naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"email"
: Email"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
File Management within Dataset
Upload documents
DataSet.upload_documents(document_list: list[dict])
Uploads documents to the current dataset.
Parameters
document_list: list[dict]
, Required
A list of dictionaries representing the documents to upload, each containing the following keys:
"display_name"
: (Optional) The file name to display in the dataset."blob"
: (Optional) The binary content of the file to upload.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
dataset = rag_object.create_dataset(name="kb_name")
dataset.upload_documents([{"display_name": "1.txt", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}, {"display_name": "2.pdf", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}])
Update document
Document.update(update_message:dict)
Updates configurations for the current document.
Parameters
update_message: dict[str, str|dict[]]
, Required
A dictionary representing the attributes to update, with the following keys:
"display_name"
:str
The name of the document to update."chunk_method"
:str
The parsing method to apply to the document."naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
"parser_config"
:dict[str, Any]
The parsing configuration for the document. Its attributes vary based on the selected"chunk_method"
:"chunk_method"
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"user_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"user_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"user_raptor": False}}
chunk_method
="book"
:
{"raptor": {"user_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"user_raptor": False}}
chunk_method
="presentation"
:
{"raptor": {"user_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
Download document
Document.download() -> bytes
Downloads the current document.
Returns
The downloaded document in bytes.
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="id")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
open("~/ragflow.txt", "wb+").write(doc.download())
print(doc)
List documents
Dataset.list_documents(id:str =None, keywords: str=None, page: int=1, page_size:int = 30, order_by:str = "create_time", desc: bool = True) -> list[Document]
Lists documents in the current dataset.
Parameters
id: str
The ID of the document to retrieve. Defaults to None
.
keywords: str
The keywords used to match document titles. Defaults to None
.
page: int
Specifies the page on which the documents will be displayed. Defaults to 1
.
page_size: int
The maximum number of documents on each page. Defaults to 30
.
orderby: str
The field by which documents should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved documents should be sorted in descending order. Defaults to True
.
Returns
- Success: A list of
Document
objects. - Failure:
Exception
.
A Document
object contains the following attributes:
id
: The document ID. Defaults to""
.name
: The document name. Defaults to""
.thumbnail
: The thumbnail image of the document. Defaults toNone
.dataset_id
: The dataset ID associated with the document. Defaults toNone
.chunk_method
The chunk method name. Defaults to"naive"
.source_type
: The source type of the document. Defaults to"local"
.type
: Type or category of the document. Defaults to""
. Reserved for future use.created_by
:str
The creator of the document. Defaults to""
.size
:int
The document size in bytes. Defaults to0
.token_count
:int
The number of tokens in the document. Defaults to0
.chunk_count
:int
The number of chunks in the document. Defaults to0
.progress
:float
The current processing progress as a percentage. Defaults to0.0
.progress_msg
:str
A message indicating the current progress status. Defaults to""
.process_begin_at
:datetime
The start time of document processing. Defaults toNone
.process_duation
:float
Duration of the processing in seconds. Defaults to0.0
.run
:str
The document's processing status:"UNSTART"
(default)"RUNNING"
"CANCEL"
"DONE"
"FAIL"
status
:str
Reserved for future use.parser_config
:ParserConfig
Configuration object for the parser. Its attributes vary based on the selectedchunk_method
:chunk_method
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"user_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"user_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"user_raptor": False}}
chunk_method
="book"
:
{"raptor": {"user_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"user_raptor": False}}
chunk_method
="presentation"
:
{"raptor": {"user_raptor": False}}
chunk_method
="picure"
:
None
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
filename1 = "~/ragflow.txt"
blob = open(filename1 , "rb").read()
dataset.upload_documents([{"name":filename1,"blob":blob}])
for doc in dataset.list_documents(keywords="rag", page=0, page_size=12):
print(doc)
Delete documents
DataSet.delete_documents(ids: list[str] = None)
Deletes documents by ID.
Parameters
ids: list[list]
The IDs of the documents to delete. Defaults to None
. If it is not specified, all documents in the dataset will be deleted.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_1")
dataset = dataset[0]
dataset.delete_documents(ids=["id_1","id_2"])
Parse documents
DataSet.async_parse_documents(document_ids:list[str]) -> None
Parses documents in the current dataset.
Parameters
document_ids: list[str]
, Required
The IDs of the documents to parse.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
{'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
Stop parsing documents
DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
Stops parsing specified documents.
Parameters
document_ids: list[str]
, Required
The IDs of the documents for which parsing should be stopped.