Python API
A complete reference for RAGFlow's Python APIs. Before proceeding, please ensure you have your RAGFlow API key ready for authentication.
Run the following command to download the Python SDK:
pip install ragflow-sdk
ERROR CODES
Code | Message | Description |
---|---|---|
400 | Bad Request | Invalid request parameters |
401 | Unauthorized | Unauthorized access |
403 | Forbidden | Access denied |
404 | Not Found | Resource not found |
500 | Internal Server Error | Server internal error |
1001 | Invalid Chunk ID | Invalid Chunk ID |
1002 | Chunk Update Failed | Chunk update failed |
OpenAI-Compatible API
Create chat completion
Creates a model response for the given historical chat conversation via OpenAI's API.
Parameters
model: str
, Required
The model used to generate the response. The server will parse this automatically, so you can set it to any value for now.
messages: list[object]
, Required
A list of historical chat messages used to generate the response. This must contain at least one message with the user
role.
stream: boolean
Whether to receive the response as a stream. Set this to false
explicitly if you prefer to receive the entire response in one go instead of as a stream.
Returns
- Success: Response message like OpenAI
- Failure:
Exception
Examples
from openai import OpenAI
model = "model"
client = OpenAI(api_key="ragflow-api-key", base_url=f"http://ragflow_address/api/v1/chats_openai/<chat_id>")
completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"},
],
stream=True
)
stream = True
if stream:
for chunk in completion:
print(chunk)
else:
print(completion.choices[0].message.content)
DATASET MANAGEMENT
Create dataset
RAGFlow.create_dataset(
name: str,
avatar: str = "",
description: str = "",
embedding_model: str = "BAAI/bge-large-zh-v1.5",
permission: str = "me",
chunk_method: str = "naive",
parser_config: DataSet.ParserConfig = None
) -> DataSet
Creates a dataset.
Parameters
name: str
, Required
The unique name of the dataset to create. It must adhere to the following requirements:
- Maximum 65,535 characters.
- Case-insensitive.
avatar: str
Base64 encoding of the avatar. Defaults to ""
description: str
A brief description of the dataset to create. Defaults to ""
.
permission
Specifies who can access the dataset to create. Available options:
"me"
: (Default) Only you can manage the dataset."team"
: All team members can manage the dataset.
chunk_method, str
The chunking method of the dataset to create. Available options:
"naive"
: General (default)"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
parser_config
The parser configuration of the dataset. A ParserConfig
object's attributes vary based on the selected chunk_method
:
chunk_method
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"user_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"user_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"user_raptor": False}}
chunk_method
="book"
:
{"raptor": {"user_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"user_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="presentation"
:
{"raptor": {"user_raptor": False}}
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: A
dataset
object. - Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
Delete datasets
RAGFlow.delete_datasets(ids: list[str] = None)
Deletes datasets by ID.
Parameters
ids: list[str]
, Required
The IDs of the datasets to delete. Defaults to None
. If it is not specified, all datasets will be deleted.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
rag_object.delete_datasets(ids=["id_1","id_2"])
List datasets
RAGFlow.list_datasets(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[DataSet]
Lists datasets.
Parameters
page: int
Specifies the page on which the datasets will be displayed. Defaults to 1
.
page_size: int
The number of datasets on each page. Defaults to 30
.
orderby: str
The field by which datasets should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved datasets should be sorted in descending order. Defaults to True
.
id: str
The ID of the dataset to retrieve. Defaults to None
.
name: str
The name of the dataset to retrieve. Defaults to None
.
Returns
- Success: A list of
DataSet
objects. - Failure:
Exception
.
Examples
List all datasets
for dataset in rag_object.list_datasets():
print(dataset)
Retrieve a dataset by ID
dataset = rag_object.list_datasets(id = "id_1")
print(dataset[0])
Update dataset
DataSet.update(update_message: dict)
Updates configurations for the current dataset.
Parameters
update_message: dict[str, str|int]
, Required
A dictionary representing the attributes to update, with the following keys:
"name"
:str
The revised name of the dataset."embedding_model"
:str
The updated embedding model name.- Ensure that
"chunk_count"
is0
before updating"embedding_model"
.
- Ensure that
"chunk_method"
:str
The chunking method for the dataset. Available options:"naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"email"
: Email
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
FILE MANAGEMENT WITHIN DATASET
Upload documents
DataSet.upload_documents(document_list: list[dict])
Uploads documents to the current dataset.
Parameters
document_list: list[dict]
, Required
A list of dictionaries representing the documents to upload, each containing the following keys:
"display_name"
: (Optional) The file name to display in the dataset."blob"
: (Optional) The binary content of the file to upload.
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
dataset = rag_object.create_dataset(name="kb_name")
dataset.upload_documents([{"display_name": "1.txt", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}, {"display_name": "2.pdf", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}])
Update document
Document.update(update_message:dict)
Updates configurations for the current document.
Parameters
update_message: dict[str, str|dict[]]
, Required
A dictionary representing the attributes to update, with the following keys:
"display_name"
:str
The name of the document to update."meta_fields"
:dict[str, Any]
The meta fields of the document."chunk_method"
:str
The parsing method to apply to the document."naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
"parser_config"
:dict[str, Any]
The parsing configuration for the document. Its attributes vary based on the selected"chunk_method"
:"chunk_method"
="naive"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}
.chunk_method
="qa"
:
{"raptor": {"user_raptor": False}}
chunk_method
="manuel"
:
{"raptor": {"user_raptor": False}}
chunk_method
="table"
:
None
chunk_method
="paper"
:
{"raptor": {"user_raptor": False}}
chunk_method
="book"
:
{"raptor": {"user_raptor": False}}
chunk_method
="laws"
:
{"raptor": {"user_raptor": False}}
chunk_method
="presentation"
:
{"raptor": {"user_raptor": False}}
chunk_method
="picture"
:
None
chunk_method
="one"
:
None
chunk_method
="knowledge-graph"
:
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method
="email"
:
None
Returns
- Success: No value is returned.
- Failure:
Exception
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
Download document
Document.download() -> bytes
Downloads the current document.
Returns
The downloaded document in bytes.
Examples
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="id")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
open("~/ragflow.txt", "wb+").write(doc.download())
print(doc)
List documents
Dataset.list_documents(id:str =None, keywords: str=None, page: int=1, page_size:int = 30, order_by:str = "create_time", desc: bool = True) -> list[Document]
Lists documents in the current dataset.
Parameters
id: str
The ID of the document to retrieve. Defaults to None
.
keywords: str
The keywords used to match document titles. Defaults to None
.
page: int
Specifies the page on which the documents will be displayed. Defaults to 1
.
page_size: int
The maximum number of documents on each page. Defaults to 30
.
orderby: str
The field by which documents should be sorted. Available options:
"create_time"
(default)"update_time"
desc: bool
Indicates whether the retrieved documents should be sorted in descending order. Defaults to True
.