Python API
A complete reference for RAGFlow's Python APIs. Before proceeding, please ensure you have your RAGFlow API key ready for authentication.
Run the following command to download the Python SDK:
pip install ragflow-sdk
ERROR CODES
Code | Message | Description |
---|---|---|
400 | Bad Request | Invalid request parameters |
401 | Unauthorized | Unauthorized access |
403 | Forbidden | Access denied |
404 | Not Found | Resource not found |
500 | Internal Server Error | Server internal error |
1001 | Invalid Chunk ID | Invalid Chunk ID |
1002 | Chunk Update Failed | Chunk update failed |
OpenAI-Compatible API
Create chat completion
Creates a model response for the given historical chat conversation via OpenAI's API.
Parameters
model: str
, Required
The model used to generate the response. The server will parse this automatically, so you can set it to any value for now.
messages: list[object]
, Required
A list of historical chat messages used to generate the response. This must contain at least one message with the user
role.
stream: boolean
Whether to receive the response as a stream. Set this to false
explicitly if you prefer to receive the entire response in one go instead of as a stream.
Returns
- Success: Response message like OpenAI
- Failure:
Exception
Examples
from openai import OpenAI
model = "model"
client = OpenAI(api_key="ragflow-api-key", base_url=f"http://ragflow_address/api/v1/chats_openai/<chat_id>")
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"},
],
stream=True
)
stream = True
if stream:
for chunk in completion:
print(chunk)
else:
print(completion.choices[0].message.content)
DATASET MANAGEMENT
Create dataset
RAGFlow.create_dataset(
name: str,
avatar: Optional[str] = None,
description: Optional[str] = None,
embedding_model: Optional[str] = "BAAI/bge-large-zh-v1.5@BAAI",
permission: str = "me",
chunk_method: str = "naive",
parser_config: DataSet.ParserConfig = None
) -> DataSet
Creates a dataset.
Parameters
name: str
, Required
The unique name of the dataset to create. It must adhere to the following requirements:
- Maximum 128 characters.
- Case-insensitive.
avatar: str
Base64 encoding of the avatar. Defaults to None
description: str
A brief description of the dataset to create. Defaults to None
.
permission
Specifies who can access the dataset to create. Available options:
"me"
: (Default) Only you can manage the dataset."team"
: All team members can manage the dataset.
chunk_method, str
The chunking method of the dataset to create. Available options:
"naive"
: General (default)"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"email"
: Email
parser_config
The parser configuration of the dataset. A ParserConfig
object's attributes vary based on the selected chunk_method
:
chunk_method
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"use_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"use_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"use_raptor": False}}
chunk_method
="book"
:
{"raptor": {"use_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"use_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="presentation"
:
{"raptor": {"use_raptor": False}}
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: A
dataset
object. - Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
Delete datasets
RAGFlow.delete_datasets(ids: list[str] | None = None)
Deletes datasets by ID.
Parameters
ids: list[str]
or None
, Required
The IDs of the datasets to delete. Defaults to None
.
- If
None
, all datasets will be deleted. - If an array of IDs, only the specified datasets will be deleted.
- If an empty array, no datasets will be deleted.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object.delete_datasets(ids=["d94a8dc02c9711f0930f7fbc369eab6d","e94a8dc02c9711f0930f7fbc369eab6e"])
List datasets
RAGFlow.list_datasets(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[DataSet]
Lists datasets.
Parameters
page: int
Specifies the page on which the datasets will be displayed. Defaults to 1
.
page_size: int
The number of datasets on each page. Defaults to 30
.
orderby: str
The field by which datasets should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved datasets should be sorted in descending order. Defaults to True
.
id: str
The ID of the dataset to retrieve. Defaults to None
.
name: str
The name of the dataset to retrieve. Defaults to None
.
Returns
- Success: A list of
DataSet
objects. - Failure:
Exception
.
Examples
List all datasets
for dataset in rag_object.list_datasets():
print(dataset)
Retrieve a dataset by ID
dataset = rag_object.list_datasets(id = "id_1")
print(dataset[0])
Update dataset
DataSet.update(update_message: dict)
Updates configurations for the current dataset.
Parameters
update_message: dict[str, str|int]
, Required
A dictionary representing the attributes to update, with the following keys:
"name"
:str
The revised name of the dataset.- Basic Multilingual Plane (BMP) only
- Maximum 128 characters
- Case-insensitive
"avatar"
: (Body parameter),string
The updated base64 encoding of the avatar.- Maximum 65535 characters
"embedding_model"
: (Body parameter),string
The updated embedding model name.- Ensure that
"chunk_count"
is0
before updating"embedding_model"
. - Maximum 255 characters
- Must follow
model_name@model_factory
format
- Ensure that
"permission"
: (Body parameter),string
The updated dataset permission. Available options:"me"
: (Default) Only you can manage the dataset."team"
: All team members can manage the dataset.
"pagerank"
: (Body parameter),int
refer to Set page rank- Default:
0
- Minimum:
0
- Maximum:
100
- Default:
"chunk_method"
: (Body parameter),enum<string>
The chunking method for the dataset. Available options:"naive"
: General (default)"book"
: Book"email"
: Email"laws"
: Laws"manual"
: Manual"one"
: One"paper"
: Paper"picture"
: Picture"presentation"
: Presentation"qa"
: Q&A"table"
: Table"tag"
: Tag
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset = dataset[0]
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
FILE MANAGEMENT WITHIN DATASET
Upload documents
DataSet.upload_documents(document_list: list[dict])
Uploads documents to the current dataset.
Parameters
document_list: list[dict]
, Required
A list of dictionaries representing the documents to upload, each containing the following keys:
"display_name"
: (Optional) The file name to display in the dataset."blob"
: (Optional) The binary content of the file to upload.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
dataset = rag_object.create_dataset(name="kb_name")
dataset.upload_documents([{"display_name": "1.txt", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}, {"display_name": "2.pdf", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}])
Update document
Document.update(update_message:dict)
Updates configurations for the current document.
Parameters
update_message: dict[str, str|dict[]]
, Required
A dictionary representing the attributes to update, with the following keys:
"display_name"
:str
The name of the document to update."meta_fields"
:dict[str, Any]
The meta fields of the document."chunk_method"
:str
The parsing method to apply to the document."naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"email"
: Email
"parser_config"
:dict[str, Any]
The parsing configuration for the document. Its attributes vary based on the selected"chunk_method"
:"chunk_method"
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"use_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"use_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"use_raptor": False}}
chunk_method
="book"
:
{"raptor": {"use_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"use_raptor": False}}
chunk_method
="presentation"
:
{"raptor": {"use_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
doc.update([{"parser_config": {"chunk_token_num": 256}}, {"chunk_method": "manual"}])
Download document
Document.download() -> bytes
Downloads the current document.
Returns
The downloaded document in bytes.
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="id")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
open("~/ragflow.txt", "wb+").write(doc.download())
print(doc)
List documents
Dataset.list_documents(id:str =None, keywords: str=None, page: int=1, page_size:int = 30, order_by:str = "create_time", desc: bool = True) -> list[Document]
Lists documents in the current dataset.
Parameters
id: str
The ID of the document to retrieve. Defaults to None
.
keywords: str
The keywords used to match document titles. Defaults to None
.
page: int
Specifies the page on which the documents will be displayed. Defaults to 1
.
page_size: int
The maximum number of documents on each page. Defaults to 30
.
orderby: str
The field by which documents should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved documents should be sorted in descending order. Defaults to True
.
Returns
- Success: A list of
Document
objects. - Failure:
Exception
.
A Document
object contains the following attributes:
id
: The document ID. Defaults to""
.name
: The document name. Defaults to""
.thumbnail
: The thumbnail image of the document. Defaults toNone
.dataset_id
: The dataset ID associated with the document. Defaults toNone
.chunk_method
The chunking method name. Defaults to"naive"
.source_type
: The source type of the document. Defaults to"local"
.type
: Type or category of the document. Defaults to""
. Reserved for future use.created_by
:str
The creator of the document. Defaults to""
.size
:int
The document size in bytes. Defaults to0
.token_count
:int
The number of tokens in the document. Defaults to0
.chunk_count
:int
The number of chunks in the document. Defaults to0
.progress
:float
The current processing progress as a percentage. Defaults to0.0
.progress_msg
:str
A message indicating the current progress status. Defaults to""
.process_begin_at
:datetime
The start time of document processing. Defaults toNone
.process_duation
:float
Duration of the processing in seconds. Defaults to0.0
.run
:str
The document's processing status:"UNSTART"
(default)"RUNNING"
"CANCEL"
"DONE"
"FAIL"
status
:str
Reserved for future use.parser_config
:ParserConfig
Configuration object for the parser. Its attributes vary based on the selectedchunk_method
:chunk_method
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"use_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"use_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"use_raptor": False}}
chunk_method
="book"
:
{"raptor": {"use_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"use_raptor": False}}
chunk_method
="presentation"
:
{"raptor": {"use_raptor": False}}
chunk_method
="picure"
:
None
chunk_method
="one"
:
None
chunk_method
="email"
:
None
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
filename1 = "~/ragflow.txt"
blob = open(filename1 , "rb").read()
dataset.upload_documents([{"name":filename1,"blob":blob}])
for doc in dataset.list_documents(keywords="rag", page=0, page_size=12):
print(doc)
Delete documents
DataSet.delete_documents(ids: list[str] = None)
Deletes documents by ID.
Parameters
ids: list[list]
The IDs of the documents to delete. Defaults to None
. If it is not specified, all documents in the dataset will be deleted.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_1")
dataset = dataset[0]
dataset.delete_documents(ids=["id_1","id_2"])
Parse documents
DataSet.async_parse_documents(document_ids:list[str]) -> None
Parses documents in the current dataset.
Parameters
document_ids: list[str]
, Required
The IDs of the documents to parse.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
{'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
Stop parsing documents
DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
Stops parsing specified documents.
Parameters
document_ids: list[str]
, Required
The IDs of the documents for which parsing should be stopped.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
{'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
dataset.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled.")
CHUNK MANAGEMENT WITHIN DATASET
Add chunk
Document.add_chunk(content:str, important_keywords:list[str] = []) -> Chunk
Adds a chunk to the current document.
Parameters
content: str
, Required
The text content of the chunk.
important_keywords: list[str]
The key terms or phrases to tag with the chunk.
Returns
- Success: A
Chunk
object. - Failure:
Exception
.
A Chunk
object contains the following attributes:
id
:str
: The chunk ID.content
:str
The text content of the chunk.important_keywords
:list[str]
A list of key terms or phrases tagged with the chunk.create_time
:str
The time when the chunk was created (added to the document).create_timestamp
:float
The timestamp representing the creation time of the chunk, expressed in seconds since January 1, 1970.dataset_id
:str
The ID of the associated dataset.document_name
:str
The name of the associated document.document_id
:str
The ID of the associated document.available
:bool
The chunk's availability status in the dataset. Value options:False
: UnavailableTrue
: Available (default)
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(id="123")
dataset = datasets[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
List chunks
Document.list_chunks(keywords: str = None, page: int = 1, page_size: int = 30, id : str = None) -> list[Chunk]
Lists chunks in the current document.
Parameters
keywords: str
The keywords used to match chunk content. Defaults to None
page: int
Specifies the page on which the chunks will be displayed. Defaults to 1
.
page_size: int
The maximum number of chunks on each page. Defaults to 30
.
id: str
The ID of the chunk to retrieve. Default: None
Returns
- Success: A list of
Chunk
objects. - Failure:
Exception
.
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets("123")
dataset = dataset[0]
docs = dataset.list_documents(keywords="test", page=1, page_size=12)
for chunk in docs[0].list_chunks(keywords="rag", page=0, page_size=12):
print(chunk)