功能介绍
在数据库操作中,创建集合(Collection)的 Schema (结构)时,需要指定各个字段的名称和数据类型。在数据库 Schema 设计之初,由于业务需求的不断演进和数据模型的持续发展,某些字段可能无法完全预见,动态标量字段允许数据库在不修改原有 Schema 的情况下,适应数据结构的变化需求,从而提高数据库的灵活性和适应性。
启用动态标量字段功能后,集合中的所有标量字段将自动创建 Filter 索引以提升查询效率。同时,支持灵活选择对特定字段不建立索引,以优化存储空间或减少索引维护开销,实现更加灵活、精细的数据库管理。
开启方式
在创建数据库集合时,可以通过配置控制参数来启用标量字段的全索引功能,并明确指定哪些字段不需要建立索引。Python、Java、Go SDK的开启方式,如下表所示。
SDK | 开启标量字段参数 | 开启方式 | 创建集合 |
Python SDK | filter_index_config |
| |
Java SDK | FilterIndexConfig |
| |
Go SDK | FilterIndexConfig |
|
使用示例
步骤1:创建数据库
import com.tencent.tcvectordb.client.RPCVectorDBClient;import com.tencent.tcvectordb.client.VectorDBClient;import com.tencent.tcvectordb.model.*;import com.tencent.tcvectordb.model.Collection;import com.tencent.tcvectordb.model.param.collection.*;import com.tencent.tcvectordb.model.param.database.ConnectParam;import com.tencent.tcvectordb.model.param.dml.*;import com.tencent.tcvectordb.model.param.entity.AffectRes;import com.tencent.tcvectordb.model.param.entity.BaseRes;import com.tencent.tcvectordb.model.param.enums.ReadConsistencyEnum;import com.tencent.tcvectordb.utils.JsonUtils;import java.util.*;public class VectorDBExample {public static void main(String[] args) {// 创建VectorDB ClientConnectParam connectParam = ConnectParam.newBuilder().withUrl("http://10.0.X.X:80").withUsername("root").withKey("eC4bLRy2va******************************").withTimeout(30).build();VectorDBClient client = new RPCVectorDBClient(connectParam,ReadConsistencyEnum.EVENTUAL_CONSISTENCY);}}Database db = client.createDatabase("db-test");
步骤2:创建集合,开启动态标量字段
// 初始化Collection参数,通过配置 withFilterIndexConfig 开启动态标量字段功能。// 在以下示例中,集合开启了动态标量字段,同时指定"test1", "test2"两个字段不创建filter索引,其余字段均默认创建filter索引CreateCollectionParam collectionParam = CreateCollectionParam.newBuilder().withName("book-vector").withShardNum(1).withReplicaNum(1).withDescription("this is a java sdk test").addField(new FilterIndex("id", FieldType.String, IndexType.PRIMARY_KEY)).addField(new VectorIndex("vector", 3, IndexType.HNSW,MetricType.COSINE, new HNSWParams(16, 200))).withFilterIndexConfig(FilterIndexConfig.newBuilder().withFilterAll(true).withFieldWithoutFilterIndex(Arrays.asList("test1", "test2")).withMaxStrLen(64).build()).build();Collection collection = db.createCollection(collectionParam);
步骤3:插入数据
写入向量数据,指定标量字段,并查询集合索引结构。
List<Document> documentList = new ArrayList<>(Arrays.asList(Document.newBuilder().withId("0001").withVector(Arrays.asList(0.2123, 0.21, 0.213)).addDocField(new DocField("bookName", "西游记")).addDocField(new DocField("author", "吴承恩")).addDocField(new DocField("array_test", Arrays.asList("1","2","3"))).addDocField(new DocField("test1", 28)).build(),Document.newBuilder().withId("0002").withVector(Arrays.asList(0.2123, 0.22, 0.213)).addDocField(new DocField("bookName", "西游记")).addDocField(new DocField("author", "吴承恩")).addDocField(new DocField("array_test", Arrays.asList("4","5","6"))).addDocField(new DocField("test2", 25)).build(),Document.newBuilder().withId("0003").withVector(Arrays.asList(0.2123, 0.23, 0.213)).addDocField(new DocField("bookName", "三国演义")).addDocField(new DocField("author", "罗贯中")).addDocField(new DocField("array_test", Arrays.asList("7","8","9"))).build(),Document.newBuilder().withId("0004").withVector(Arrays.asList(0.2123, 0.24, 0.213)).addDocField(new DocField("bookName", "三国演义")).addDocField(new DocField("author", "罗贯中")).addDocField(new DocField("array_test", Arrays.asList("10","11","12"))).addDocField(new DocField("test1", 23).build(),Document.newBuilder().withId("0005").withVector(Arrays.asList(0.2123, 0.25, 0.213)).addDocField(new DocField("bookName", "三国演义")).addDocField(new DocField("author", "罗贯中")).build()));System.out.println("---------------------- upsert ----------------------");InsertParam insertParam = InsertParam.newBuilder().withDocuments(documentList).build();AffectRes affectRes = client.upsert("db-test", "book-vector", insertParam);System.out.println(JsonUtils.toJsonString(affectRes));// 查询集合的结构Database database = client.database("db-test");Collection collection = database.describeCollection("book-vector");System.out.println("\\tres: " + collection.toString());
使用
describeCollection 查询集合结构,如下所示,标量字段 bookName、author、array_test 均已自动创建 Filter 索引,而特定的字段 test1、test2并没有创建索引。{"database": "db-test","collection": "book-vector","replicaNum": 1,"shardNum": 1,"description": "this is a java sdk test","indexes": [{"fieldName": "array_test","fieldType": "array","indexType": "filter","fieldElementType": "string"},{"fieldName": "author","fieldType": "string","indexType": "filter"},{"fieldName": "id","fieldType": "string","indexType": "primaryKey"},{"fieldName": "bookName","fieldType": "string","indexType": "filter"},{"fieldName": "vector","fieldType": "vector","indexType": "HNSW","metricType": "COSINE","params": {"efConstruction": 200,"M": 16},"dimension": 3}],"createTime": "2024-12-19 17:07:53","documentCount": 0,"indexStatus": {"status": "ready"},"alias": [],"filterIndexConfig": {"filterAll": true,"fieldsWithoutIndex": ["test1","test2"],"maxStrLen": 32}}
步骤4:应用动态标量字段相似性检索
// 使用标量字段设置 Filter 表达式Filter filterParam = new Filter("bookName=\\"三国演义\\"").and(Filter.exclude("array_test", Arrays.asList("7")));System.out.println("---------------------- search ----------------------");// 设置检索参数SearchByVectorParam searchByVectorParam = SearchByVectorParam.newBuilder().addVector(Arrays.asList(0.2123, 0.23, 0.213))// 若使用 HNSW 索引,则需要指定参数ef,ef越大,召回率越高,但也会影响检索速度.withParams(new HNSWSearchParams(100))// 指定 Top K 的 K 值.withLimit(10)// 过滤获取到结果.withFilter(filterParam).build();// 输出相似性检索结果,检索结果为二维数组,每一位为一组返回结果,分别对应 search 时指定的多个向量List<List<Document>> svDocs = client.search(DBNAME, COLL_NAME, searchByVectorParam);int i = 0;for (List<Document> docs : svDocs) {System.out.println("\\tres: " + i);i++;for (Document doc : docs) {System.out.println("\\tres: " + doc.toString());}}
相似性检索结果,如下所示,可以根据动态写入的标量字段进行数据过滤,筛选出满足 filter 条件的检索结果。
res: 0res: {"id":"0004","score":0.9997869729995728,"bookName":"三国演义","author":"罗贯中","array_test":["10","11","12"]}res: {"id":"0005","score":0.9991745948791504,"bookName":"三国演义","author":"罗贯中"}hor":"罗贯中"}