HTTP API Reference
A complete reference for RAGFlow's RESTful API. Before proceeding, please ensure you have your RAGFlow API key ready for authentication.
Dataset Management
Create dataset
POST /api/v1/datasets
Creates a dataset.
Request
- Method: POST
- URL:
/api/v1/datasets
- Headers:
'content-Type: application/json'
'Authorization: Bearer <YOUR_API_KEY>'
- Body:
"name"
:string
"avatar"
:string
"description"
:string
"language"
:string
"embedding_model"
:string
"permission"
:string
"chunk_method"
:string
"parser_config"
:object
Request example
curl --request POST \
--url http://{address}/api/v1/datasets \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "test_1"
}'
Request parameters
-
"name"
: (Body parameter),string
, Required
The unique name of the dataset to create. It must adhere to the following requirements:- Permitted characters include:
- English letters (a-z, A-Z)
- Digits (0-9)
- "_" (underscore)
- Must begin with an English letter or underscore.
- Maximum 65,535 characters.
- Case-insensitive.
- Permitted characters include:
-
"avatar"
: (Body parameter),string
Base64 encoding of the avatar. -
"description"
: (Body parameter),string
A brief description of the dataset to create. -
"language"
: (Body parameter),string
The language setting of the dataset to create. Available options:"English"
(default)"Chinese"
-
"embedding_model"
: (Body parameter),string
The name of the embedding model to use. For example:"BAAI/bge-zh-v1.5"
-
"permission"
: (Body parameter),string
Specifies who can access the dataset to create. Available options:"me"
: (Default) Only you can manage the dataset."team"
: All team members can manage the dataset.
-
"chunk_method"
: (Body parameter),enum<string>
The chunking method of the dataset to create. Available options:"naive"
: General (default)"manual"
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
-
"parser_config"
: (Body parameter),object
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected"chunk_method"
:- If
"chunk_method"
is"naive"
, the"parser_config"
object contains the following attributes:"chunk_token_count"
: Defaults to128
."layout_recognize"
: Defaults totrue
."html4excel"
: Indicates whether to convert Excel documents into HTML format. Defaults tofalse
."delimiter"
: Defaults to"\n!?。;!?"
."task_page_size"
: Defaults to12
. For PDF only."raptor"
: Raptor-specific settings. Defaults to:{"use_raptor": false}
.
- If
"chunk_method"
is"qa"
,"manuel"
,"paper"
,"book"
,"laws"
, or"presentation"
, the"parser_config"
object contains the following attribute:"raptor"
: Raptor-specific settings. Defaults to:{"use_raptor": false}
.
- If
"chunk_method"
is"table"
,"picture"
,"one"
, or"email"
,"parser_config"
is an empty JSON object. - If
"chunk_method"
is"knowledge_graph"
, the"parser_config"
object contains the following attributes:"chunk_token_count"
: Defaults to128
."delimiter"
: Defaults to"\n!?。;!?"
."entity_types"
: Defaults to["organization","person","location","event","time"]
- If
Response
Success:
{
"code": 0,
"data": {
"avatar": null,
"chunk_count": 0,
"chunk_method": "naive",
"create_date": "Thu, 24 Oct 2024 09:14:07 GMT",
"create_time": 1729761247434,
"created_by": "69736c5e723611efb51b0242ac120007",
"description": null,
"document_count": 0,
"embedding_model": "BAAI/bge-large-zh-v1.5",
"id": "527fa74891e811ef9c650242ac120006",
"language": "English",
"name": "test_1",
"parser_config": {
"chunk_token_num": 128,
"delimiter": "\\n!?;。;!?",
"html4excel": false,
"layout_recognize": true,
"raptor": {
"user_raptor": false
}
},
"permission": "me",
"similarity_threshold": 0.2,
"status": "1",
"tenant_id": "69736c5e723611efb51b0242ac120007",
"token_num": 0,
"update_date": "Thu, 24 Oct 2024 09:14:07 GMT",
"update_time": 1729761247434,
"vector_similarity_weight": 0.3
}
}
Failure:
{
"code": 102,
"message": "Duplicated knowledgebase name in creating dataset."
}
Delete datasets
DELETE /api/v1/datasets
Deletes datasets by ID.
Request
- Method: DELETE
- URL:
/api/v1/datasets
- Headers:
'content-Type: application/json'
'Authorization: Bearer <YOUR_API_KEY>'
- Body:
"ids"
:list[string]
Request example
curl --request DELETE \
--url http://{address}/api/v1/datasets \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"ids": ["test_1", "test_2"]
}'
Request parameters
"ids"
: (Body parameter),list[string]
The IDs of the datasets to delete. If it is not specified, all datasets will be deleted.
Response
Success:
{
"code": 0
}
Failure:
{
"code": 102,
"message": "You don't own the dataset."
}
Update dataset
PUT /api/v1/datasets/{dataset_id}
Updates configurations for a specified dataset.
Request
- Method: PUT
- URL:
/api/v1/datasets/{dataset_id}
- Headers:
'content-Type: application/json'
'Authorization: Bearer <YOUR_API_KEY>'
- Body:
"name"
:string
"embedding_model"
:string
"chunk_method"
:enum<string>
Request example
curl --request PUT \
--url http://{address}/api/v1/datasets/{dataset_id} \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '
{
"name": "updated_dataset"
}'
Request parameters
dataset_id
: (Path parameter)
The ID of the dataset to update."name"
: (Body parameter),string
The revised name of the dataset."embedding_model"
: (Body parameter),string
The updated embedding model name.- Ensure that
"chunk_count"
is0
before updating"embedding_model"
.
- Ensure that
"chunk_method"
: (Body parameter),enum<string>
The chunking method for the dataset. Available options:"naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
:One"email"
: Email"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!
Response
Success:
{
"code": 0
}
Failure:
{
"code": 102,
"message": "Can't change tenant_id."
}
List datasets
GET /api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}
Lists datasets.
Request
- Method: GET
- URL:
/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}
- Headers:
'Authorization: Bearer <YOUR_API_KEY>'
Request example
curl --request GET \
--url http://{address}/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id} \
--header 'Authorization: Bearer <YOUR_API_KEY>'
Request parameters
page
: (Filter parameter)
Specifies the page on which the datasets will be displayed. Defaults to1
.page_size
: (Filter parameter)
The number of datasets on each page. Defaults to30
.orderby
: (Filter parameter)
The field by which datasets should be sorted. Available options:create_time
(default)update_time
desc
: (Filter parameter)
Indicates whether the retrieved datasets should be sorted in descending order. Defaults totrue
.name
: (Filter parameter)
The name of the dataset to retrieve.id
: (Filter parameter)
The ID of the dataset to retrieve.
Response
Success:
{
"code": 0,
"data": [
{
"avatar": "",
"chunk_count": 59,
"create_date": "Sat, 14 Sep 2024 01:12:37 GMT",
"create_time": 1726276357324,
"created_by": "69736c5e723611efb51b0242ac120007",
"description": null,
"document_count": 1,
"embedding_model": "BAAI/bge-large-zh-v1.5",
"id": "6e211ee0723611efa10a0242ac120007",
"language": "English",
"name": "mysql",
"chunk_method": "knowledge_graph",
"parser_config": {
"chunk_token_num": 8192,
"delimiter": "\\n!?;。;!?",
"entity_types": [
"organization",
"person",
"location",
"event",
"time"
]
},
"permission": "me",
"similarity_threshold": 0.2,
"status": "1",
"tenant_id": "69736c5e723611efb51b0242ac120007",
"token_num": 12744,
"update_date": "Thu, 10 Oct 2024 04:07:23 GMT",
"update_time": 1728533243536,
"vector_similarity_weight": 0.3
}
]
}
Failure:
{
"code": 102,
"message": "The dataset doesn't exist"
}
File Management within Dataset
Upload documents
POST /api/v1/datasets/{dataset_id}/documents
Uploads documents to a specified dataset.
Request
- Method: POST
- URL:
/api/v1/datasets/{dataset_id}/documents
- Headers:
'Content-Type: multipart/form-data'
'Authorization: Bearer <YOUR_API_KEY>'
- Form:
'file=@{FILE_PATH}'
Request example
curl --request POST \
--url http://{address}/api/v1/datasets/{dataset_id}/documents \
--header 'Content-Type: multipart/form-data' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--form 'file=@./test1.txt' \
--form 'file=@./test2.pdf'
Request parameters
dataset_id
: (Path parameter)
The ID of the dataset to which the documents will be uploaded.'file'
: (Body parameter)
A document to upload.
Response
Success:
{
"code": 0,
"data": [
{
"chunk_method": "naive",
"created_by": "69736c5e723611efb51b0242ac120007",
"dataset_id": "527fa74891e811ef9c650242ac120006",
"id": "b330ec2e91ec11efbc510242ac120004",
"location": "1.txt",
"name": "1.txt",
"parser_config": {
"chunk_token_num": 128,
"delimiter": "\\n!?;。;!?",
"html4excel": false,
"layout_recognize": true,
"raptor": {
"user_raptor": false
}
},
"run": "UNSTART",
"size": 17966,
"thumbnail": "",
"type": "doc"
}
]
}
Failure:
{
"code": 101,
"message": "No file part!"
}
Update document
PUT /api/v1/datasets/{dataset_id}/documents/{document_id}
Updates configurations for a specified document.
Request
- Method: PUT
- URL:
/api/v1/datasets/{dataset_id}/documents/{document_id}
- Headers:
'content-Type: application/json'
'Authorization: Bearer <YOUR_API_KEY>'
- Body:
"name"
:string
"chunk_method"
:string
"parser_config"
:object
Request example
curl --request PUT \
--url http://{address}/api/v1/datasets/{dataset_id}/info/{document_id} \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '
{
"name": "manual.txt",
"chunk_method": "manual",
"parser_config": {"chunk_token_count": 128}
}'
Request parameters
dataset_id
: (Path parameter)
The ID of the associated dataset.document_id
: (Path parameter)
The ID of the document to update."name"
: (Body parameter),string
"chunk_method"
: (Body parameter),string
The parsing method to apply to the document:"naive"
: General"manual
: Manual"qa"
: Q&A"table"
: Table"paper"
: Paper"book"
: Book"laws"
: Laws"presentation"
: Presentation"picture"
: Picture"one"
: One"knowledge_graph"
: Knowledge Graph
Ensure your LLM is properly configured on the Settings page before selecting this. Please also note that Knowledge Graph consumes a large number of Tokens!"email"
: Email
"parser_config"
: (Body parameter),object
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected"chunk_method"
:- If
"chunk_method"
is"naive"
, the"parser_config"
object contains the following attributes:"chunk_token_count"
: Defaults to128
."layout_recognize"
: Defaults totrue
."html4excel"
: Indicates whether to convert Excel documents into HTML format. Defaults tofalse
."delimiter"
: Defaults to"\n!?。;!?"
."task_page_size"
: Defaults to12
. For PDF only."raptor"
: Raptor-specific settings. Defaults to:{"use_raptor": false}
.
- If
"chunk_method"
is"qa"
,"manuel"
,"paper"
,"book"
,"laws"
, or"presentation"
, the"parser_config"
object contains the following attribute:"raptor"
: Raptor-specific settings. Defaults to:{"use_raptor": false}
.
- If
"chunk_method"
is"table"
,"picture"
,"one"
, or"email"
,"parser_config"
is an empty JSON object. - If
"chunk_method"
is"knowledge_graph"
, the"parser_config"
object contains the following attributes:"chunk_token_count"
: Defaults to128
."delimiter"
: Defaults to"\n!?。;!?"
."entity_types"
: Defaults to["organization","person","location","event","time"]
- If
Response
Success:
{
"code": 0
}
Failure:
{
"code": 102,
"message": "The dataset does not have the document."
}
Download document
GET /api/v1/datasets/{dataset_id}/documents/{document_id}
Downloads a document from a specified dataset.