如何通過HTTP API分組檢索Doc 詳情 - 數據庫,向量檢索,人工智能,AI,大模型,數據倉庫,大數據,yyds乾貨盤點向量檢索博客

本文介紹如何通過HTTP API在Collection中進行分組相似性檢索。

前提條件

已創建Cluster：創建Cluster。
已獲得API-KEY：API-KEY管理。

Method與URL

HTTP

POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by

使用示例

説明

需要使用您的api-key替換示例中的YOUR_API_KEY、您的Cluster Endpoint替換示例中的YOUR_CLUSTER_ENDPOINT，代碼才能正常運行。
本示例需要參考分組向量檢索提前創建好名稱為group_by_demo的Collection，並插入部分數據。

根據向量進行分組相似性檢索

Shell

l -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

示例輸出

{
    "code": 0,
    "request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
    "message": "Success",
    "output": [
        {
            "docs": [
                {
                    "id": "4",
                    "vector": [
                        0.621783971786499,
                        0.5220040082931519,
                        0.8403469920158386,
                        0.995602011680603
                    ],
                    "fields": {
                        "document_id": "paper-02",
                        "content": "xxxD",
                        "chunk_id": 2
                    },
                    "score": 0.028402328
                }
            ],
            "group_id": "paper-02"
        },
        {
            "docs": [
                {
                    "id": "1",
                    "vector": [
                        0.26870301365852356,
                        0.8718249797821045,
                        0.6066280007362366,
                        0.6342290043830872
                    ],
                    "fields": {
                        "document_id": "paper-01",
                        "content": "xxxA",
                        "chunk_id": 1
                    },
                    "score": 0.08141637
                }
            ],
            "group_id": "paper-01"
        },
        {
            "docs": [
                {
                    "id": "6",
                    "vector": [
                        0.661965012550354,
                        0.730430006980896,
                        0.6105219721794128,
                        0.22164000570774078
                    ],
                    "fields": {
                        "document_id": "paper-03",
                        "content": "xxxF",
                        "chunk_id": 1
                    },
                    "score": 0.2513085
                }
            ],
            "group_id": "paper-03"
        }
    ]
}

根據主鍵（對應的向量）進行分組相似性檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

帶過濾條件的分組相似性檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filter": "chunk_id > 1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

帶有Sparse Vector的分組向量檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

使用多向量集合的一個向量執行分組檢索

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "author",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true,
    "vector_field": "title"
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/multi_vector_demo/query_group_by

# example output
#{
#    "code": 0,
#    "request_id": "b6f4997e-97e0-4d9b-9d3f-0659f4499305",
#    "message": "Success",
#    "output": [
#        {
#            "docs": [
#                {
#                    "id": "2",
#                    "vectors": {
#                        "title": [
#                            0.10000000149011612,
#                            0.20000000298023224,
#                            0.30000001192092896,
#                            0.4000000059604645
#                        ]
#                    },
#                    "fields": {
#                        "author": "zhangsan"
#                    },
#                    "score": 0.0
#                }
#            ],
#            "group_id": "zhangsan"
#        },
#        {
#            "docs": [
#                {
#                    "id": "1",
#                    "vectors": {
#                        "title": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579
#                        ],
#                        "content": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579,
#                            0.699999988079071,
#                            0.800000011920929
#                        ]
#                    },
#                    "fields": {
#                        "author": null
#                    },
#                    "score": 0.16000001
#                }
#            ]
#        }
#    ]
#}
#

入參描述

説明

vector和id兩個入參需要二選一使用，並保證其中一個不為空。

參數	Location	類型	必填	説明
{Endpoint}	path	str	是	Cluster的Endpoint，可在控制枱Cluster詳情中查看
{CollectionName}	path	str	是	Collection名稱
dashvector-auth-token	header	str	是	api-key
group_by_field	body	str	是	按指定字段的值來分組檢索，目前不支持schema-free字段
group_count	body	int	否	最多返回的分組個數，盡力而為參數，一般可以返回group_count個分組。
group_topk	body	int	否	每個分組返回group_topk條相似性結果，盡力而為參數，優先級低於group_count。
vector	body	array	否	向量數據
sparse_vector	body	dict	否	稀疏向量
id	body	str	否	主鍵，表示根據主鍵對應的向量進行相似性檢索
filter	body	str	否	過濾條件，需滿足SQL where子句規範，詳見
include_vector	body	bool	否	是否返回向量數據，默認false
output_fields	body	array	否	返回field的字段名列表，默認返回所有Fields
vector_field	body	str	否	使用多向量檢索的一個向量執行分組檢索。
partition	body	str	否	Partition名稱

出參描述

字段	類型	描述	示例
code	int	返回值，參考返回狀態碼説明	0
message	str	返回消息	success
request_id	str	請求唯一id	19215409-ea66-4db9-8764-26ce2eb5bb99
output	array	分組相似性檢索結果，Group列表

向量檢索博客

向量檢索博客

博客 / 詳情

如何通過HTTP API分組檢索Doc

前提條件

Method與URL

使用示例

根據向量進行分組相似性檢索

根據主鍵（對應的向量）進行分組相似性檢索

帶過濾條件的分組相似性檢索

帶有Sparse Vector的分組向量檢索

使用多向量集合的一個向量執行分組檢索

入參描述

出參描述

發佈評論

Product

Company

Support

Company

博客 / 詳情

如何通過HTTP API分組檢索Doc

前提條件

Method與URL

使用示例

根據向量進行分組相似性檢索

根據主鍵（對應的向量）進行分組相似性檢索

帶過濾條件的分組相似性檢索

帶有Sparse Vector的分組向量檢索

使用多向量集合的一個向量執行分組檢索

入參描述

出參描述

發佈 評論

發佈評論