本文介紹如何通過HTTP API在Collection中進行分組相似性檢索。
前提條件
- 已創建Cluster:創建Cluster。
- 已獲得API-KEY:API-KEY管理。
Method與URL
HTTP
POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by
使用示例
説明
- 需要使用您的api-key替換示例中的YOUR_API_KEY、您的Cluster Endpoint替換示例中的YOUR_CLUSTER_ENDPOINT,代碼才能正常運行。
- 本示例需要參考分組向量檢索提前創建好名稱為
group_by_demo的Collection,並插入部分數據。
根據向量進行分組相似性檢索
Shell
l -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
示例輸出
{
"code": 0,
"request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
"message": "Success",
"output": [
{
"docs": [
{
"id": "4",
"vector": [
0.621783971786499,
0.5220040082931519,
0.8403469920158386,
0.995602011680603
],
"fields": {
"document_id": "paper-02",
"content": "xxxD",
"chunk_id": 2
},
"score": 0.028402328
}
],
"group_id": "paper-02"
},
{
"docs": [
{
"id": "1",
"vector": [
0.26870301365852356,
0.8718249797821045,
0.6066280007362366,
0.6342290043830872
],
"fields": {
"document_id": "paper-01",
"content": "xxxA",
"chunk_id": 1
},
"score": 0.08141637
}
],
"group_id": "paper-01"
},
{
"docs": [
{
"id": "6",
"vector": [
0.661965012550354,
0.730430006980896,
0.6105219721794128,
0.22164000570774078
],
"fields": {
"document_id": "paper-03",
"content": "xxxF",
"chunk_id": 1
},
"score": 0.2513085
}
],
"group_id": "paper-03"
}
]
}
根據主鍵(對應的向量)進行分組相似性檢索
Shell
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
帶過濾條件的分組相似性檢索
Shell
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"filter": "chunk_id > 1",
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
帶有Sparse Vector的分組向量檢索
Shell
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
使用多向量集合的一個向量執行分組檢索
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"group_by_field": "author",
"group_topk": 1,
"group_count": 3,
"include_vector": true,
"vector_field": "title"
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/multi_vector_demo/query_group_by
# example output
#{
# "code": 0,
# "request_id": "b6f4997e-97e0-4d9b-9d3f-0659f4499305",
# "message": "Success",
# "output": [
# {
# "docs": [
# {
# "id": "2",
# "vectors": {
# "title": [
# 0.10000000149011612,
# 0.20000000298023224,
# 0.30000001192092896,
# 0.4000000059604645
# ]
# },
# "fields": {
# "author": "zhangsan"
# },
# "score": 0.0
# }
# ],
# "group_id": "zhangsan"
# },
# {
# "docs": [
# {
# "id": "1",
# "vectors": {
# "title": [
# 0.30000001192092896,
# 0.4000000059604645,
# 0.5,
# 0.6000000238418579
# ],
# "content": [
# 0.30000001192092896,
# 0.4000000059604645,
# 0.5,
# 0.6000000238418579,
# 0.699999988079071,
# 0.800000011920929
# ]
# },
# "fields": {
# "author": null
# },
# "score": 0.16000001
# }
# ]
# }
# ]
#}
#
入參描述
説明
vector和id兩個入參需要二選一使用,並保證其中一個不為空。
|
參數 |
Location |
類型 |
必填 |
説明 |
|
{Endpoint} |
path |
str |
是 |
Cluster的Endpoint,可在控制枱Cluster詳情中查看 |
|
{CollectionName} |
path |
str |
是 |
Collection名稱 |
|
dashvector-auth-token |
header |
str |
是 |
api-key |
|
group_by_field |
body |
str |
是 |
按指定字段的值來分組檢索,目前不支持schema-free字段 |
|
group_count |
body |
int |
否 |
最多返回的分組個數,盡力而為參數,一般可以返回group_count個分組。 |
|
group_topk |
body |
int |
否 |
每個分組返回group_topk條相似性結果,盡力而為參數,優先級低於group_count。 |
|
vector |
body |
array |
否 |
向量數據 |
|
sparse_vector |
body |
dict |
否 |
稀疏向量 |
|
id |
body |
str |
否 |
主鍵,表示根據主鍵對應的向量進行相似性檢索 |
|
filter |
body |
str |
否 |
過濾條件,需滿足SQL where子句規範,詳見 |
|
include_vector |
body |
bool |
否 |
是否返回向量數據,默認false |
|
output_fields |
body |
array |
否 |
返回field的字段名列表,默認返回所有Fields |
|
vector_field |
body |
str |
否 |
使用多向量檢索的一個向量執行分組檢索。 |
|
partition |
body |
str |
否 |
Partition名稱 |
出參描述
|
字段 |
類型 |
描述 |
示例 |
|
code |
int |
返回值,參考返回狀態碼説明 |
0 |
|
message |
str |
返回消息 |
success |
|
request_id |
str |
請求唯一id |
19215409-ea66-4db9-8764-26ce2eb5bb99 |
|
output |
array |
分組相似性檢索結果,Group列表 |