Luene是一款高性能、可扩展的信息检索库,可实现对文档元信息、文档内容的搜索功能。用户可以使用Lucene 或 基于Lucene开发的成熟产品Nutch/Solr/Elasticsearch等,快速构建搜索服务,如文件搜索、网页搜索等。在Lucene概览中,我们初步介绍了其底层的核心存储文件,本文主要介绍其中的数值索引(Point索引)部分,分析数值索引的文件结构及其读写流程。
在早期版本中,Luene并没有针对数值设置专属的字段类型,因此数值也是当做字符串存储的,所有字段都是字符串类型,倒排索引均由Trie-Tree数据结构实现。这种索引结构对于精确的term查询比较友好,可以快速返回结果。而对于数值类型的范围查询,效率就比较低了。考虑到数值类型的字段常用于范围比较,从Lucene 6.0版本开始,引入针对数值类型的新索引数据结构BKD-Tree,用于优化Lucene中范围查询的性能。由于这一索引结构最初用于地理坐标场景,因此被命名为Point索引。
Point索引基于BKD-Tree(Block K-Dimension Balanced Tree)实现。类似LSM-Tree,BKD-Tree为一组KDB-Tree(K-Dimension Balanced Tree)的集合。Lucene的一个Index由多个Segment组成,每个Segment中每个数值字段的索引即为一个KDB-Tree。而在Segment Merge的过程中,多个KDB-Tree会进行合并,生成一个较大的KDB-Tree。
KDB-Tree实际是一棵特殊的多维度B+Tree,和传统B+Tree只包含一个维度略有不同,KDB-Tree会按照多个维度持续切分,生成整个树结构。这里采用常规的构建方式,以二维平面点(x,y)的集合A(2,3)、B(5,4)、C(9,6)、D(4,7)、E(8,1)、F(7,2)为例,假设叶子节点最多包含2个平面点,2阶KDB-Tree的构建过程如下:
在构建KDB-Tree的过程中,一个重要的步骤是切分维度的选择,常见选择方式为:
上述是对BKD-Tree的简要介绍,方便读者建立对BKD-Tree的直观印象,如果希望了解更多BKD-Tree、KDB-Tree相关内容,可参考相应论文。由于Lucene未对BKD-Tree和KDB-Tree进行明确的概念区分,为了和源码一致,本文在后续介绍中会统一使用名词BKD-Tree。
数值索引依赖于BKD-Tree加速范围查询,基本原理是利用BKD-Tree类似B+Tree的索引功能。我们继续结合2.1节的示例做介绍,当我们需要查询x∈[3, 6), y∈[2, 5)范围内的所有平面点时,我们的查询逻辑如下:
Lucene的Point索引以BKD-Tree实现,每一个字段的Point Index是一棵独立的BKD-Tree,持久化到磁盘上已.dim和.dii文件存储,整体文件结构如下:
注:绿色箭头代表数据结构展开,红色箭头代表文件偏移(指针)
当用户对某字段进行条件查询时,可以先通过.dii获取该字段的Point索引(BKD-Tree)偏移,然后在.dim中定位BKD-Tree的非叶子节点(packed index),按照切分维度信息遍历BKD-Tree得到符合条件的叶子节点,最后读取叶子节点过滤得到最终的doc id集合。
本节初步介绍Point索引的整体结构,读者先建立一个初步印象,下面会结合读写流程详细介绍Point索引的构建和查询。
Point索引读写的核心是对BKD-Tree的构建和查询,而BKD-Tree是多维度平衡树,在Lucene使用过程中,我们常使用的场景为一维(如整型字段)、二维(如地理坐标类型字段)。下面我们以二维场景为例,结合3中介绍的文件结构,详细介绍多维场景下Point索引的读写流程。
我们知道,Lucene在处理写入请求时,首先对写入数据进行预处理并缓存在内存中,然后周期性的从内存刷向磁盘,生成Segment。数值索引作为核心存储的一部分,处理流程也是如此。下面我们逐步介绍:
for (IndexableField field : docState.doc) {
fieldCount = processField(field, fieldGen, fieldCount);
}
if (fieldType.pointDimensionCount() != 0) {
if (fp == null) {
fp = getOrAddField(fieldName, fieldType, false);
}
indexPoint(fp, field);
}
fp.pointValuesWriter.addPackedValue(docState.docID, field.binaryValue());
bytes.append(value);
docIDs[numPoints] = docID;
public Sorter.DocMap flush(SegmentWriteState state) throws IOException, AbortingException {
……
writePoints(state, sortMap);
……
}
for (int i=0;i<fieldHash.length;i++) {
PerField perField = fieldHash[i];
while (perField != null) {
if (perField.pointValuesWriter != null) {
……
if (pointsWriter == null) {
pointsWriter = fmt.fieldsWriter(state);
}
perField.pointValuesWriter.flush(state, sortMap, pointsWriter);
perField.pointValuesWriter = null;
}
……
perField = perField.next;
}
}
String dataFileName = IndexFileNames.segmentFileName(writeState.segmentInfo.name,
writeState.segmentSuffix,
Lucene60PointsFormat.DATA_EXTENSION);
dataOut = writeState.directory.createOutput(dataFileName, writeState.context);
boolean success = false;
try {
CodecUtil.writeIndexHeader(dataOut,
Lucene60PointsFormat.DATA_CODEC_NAME,
Lucene60PointsFormat.DATA_VERSION_CURRENT,
writeState.segmentInfo.getId(),
writeState.segmentSuffix);
success = true;
}
MutablePointsReader values = new MutablePointsReader() {
……
}
……
writer.writeField(fieldInfo, values);
final long fp = writer.writeField(dataOut, fieldInfo.name, (MutablePointsReader) values);
if (fp != -1) {
indexFPs.put(fieldInfo.name, fp);
}
long innerNodeCount = 1;
while (countPerLeaf > maxPointsInLeafNode) {
countPerLeaf = (countPerLeaf+1)/2;
innerNodeCount *= 2;
}
int numLeaves = Math.toIntExact(innerNodeCount);
Arrays.fill(minPackedValue, (byte) 0xff);
Arrays.fill(maxPackedValue, (byte) 0);
for (int i = 0; i < Math.toIntExact(pointCount); ++i) {
reader.getValue(i, scratchBytesRef1);
for(int dim=0;dim<numDims;dim++) {
int offset = dim*bytesPerDim;
if (StringHelper.compare(bytesPerDim, scratchBytesRef1.bytes, scratchBytesRef1.offset + offset, minPackedValue, offset) < 0) {
System.arraycopy(scratchBytesRef1.bytes, scratchBytesRef1.offset + offset, minPackedValue, offset, bytesPerDim);
}
if (StringHelper.compare(bytesPerDim, scratchBytesRef1.bytes, scratchBytesRef1.offset + offset, maxPackedValue, offset) > 0) {
System.arraycopy(scratchBytesRef1.bytes, scratchBytesRef1.offset + offset, maxPackedValue, offset, bytesPerDim);
}
}
}
final int[] parentSplits = new int[numDims];
build(1, numLeaves, reader, 0, Math.toIntExact(pointCount), out,
minPackedValue, maxPackedValue, parentSplits,
splitPackedValues, leafBlockFPs,
new int[maxPointsInLeafNode]);
long indexFP = out.getFilePointer();
writeIndex(out, Math.toIntExact(countPerLeaf), leafBlockFPs, splitPackedValues);
return indexFP;
CodecUtil.writeIndexHeader(indexOut,
Lucene60PointsFormat.META_CODEC_NAME,
Lucene60PointsFormat.INDEX_VERSION_CURRENT,
writeState.segmentInfo.getId(),
writeState.segmentSuffix);
int count = indexFPs.size();
indexOut.writeVInt(count);
for(Map.Entry<String,Long> ent : indexFPs.entrySet()) {
FieldInfo fieldInfo = writeState.fieldInfos.fieldInfo(ent.getKey());
if (fieldInfo == null) {
throw new IllegalStateException("wrote field=\"" + ent.getKey() + "\" but that field doesn't exist in FieldInfos");
}
indexOut.writeVInt(fieldInfo.number);
indexOut.writeVLong(ent.getValue());
}
CodecUtil.writeFooter(indexOut);
BKD-Tree的构建是一个递归过程,从根节点开始构建,选择合适的维度持续进行拆分,直到产生足够的叶子节点,保证每个叶子节点包含的point value不超过1024个。对于叶子节点,则把其包含的point做整理后,写入dim存储,并记录该节点起始位置。详细构造流程如下:
final int splitDim = split(minPackedValue, maxPackedValue, parentSplits);
final int mid = (from + to + 1) >>> 1;
int commonPrefixLen = bytesPerDim;
for (int i = 0; i < bytesPerDim; ++i) {
if (minPackedValue[splitDim * bytesPerDim + i] != maxPackedValue[splitDim * bytesPerDim + i]) {
commonPrefixLen = i;
break;
}
}
MutablePointsReaderUtils.partition(maxDoc, splitDim, bytesPerDim, commonPrefixLen,
reader, from, to, mid, scratchBytesRef1, scratchBytesRef2);
final int address = nodeID * (1+bytesPerDim);
splitPackedValues[address] = (byte) splitDim;
reader.getValue(mid, scratchBytesRef1);
System.arraycopy(scratchBytesRef1.bytes,
scratchBytesRef1.offset + splitDim * bytesPerDim,
splitPackedValues, address + 1, bytesPerDim);
byte[] minSplitPackedValue = Arrays.copyOf(minPackedValue, packedBytesLength);
byte[] maxSplitPackedValue = Arrays.copyOf(maxPackedValue, packedBytesLength);
System.arraycopy(scratchBytesRef1.bytes,
scratchBytesRef1.offset + splitDim * bytesPerDim,
minSplitPackedValue, splitDim * bytesPerDim, bytesPerDim);
System.arraycopy(scratchBytesRef1.bytes,
scratchBytesRef1.offset + splitDim * bytesPerDim,
maxSplitPackedValue, splitDim * bytesPerDim, bytesPerDim);
// 递归遍历构造左右子树
parentSplits[splitDim]++;
build(nodeID * 2, leafNodeOffset, reader, from, mid, out,
minPackedValue, maxSplitPackedValue, parentSplits,
splitPackedValues, leafBlockFPs, spareDocIds);
build(nodeID * 2 + 1, leafNodeOffset, reader, mid, to, out,
minSplitPackedValue, maxPackedValue, parentSplits,
splitPackedValues, leafBlockFPs, spareDocIds);
parentSplits[splitDim]--;
Arrays.fill(commonPrefixLengths, bytesPerDim);
reader.getValue(from, scratchBytesRef1);
for (int i = from + 1; i < to; ++i) {
reader.getValue(i, scratchBytesRef2);
for (int dim=0;dim<numDims;dim++) {
final int offset = dim * bytesPerDim;
for(int j=0;j<commonPrefixLengths[dim];j++) {
if (scratchBytesRef1.bytes[scratchBytesRef1.offset+offset+j] != scratchBytesRef2.bytes[scratchBytesRef2.offset+offset+j]) {
commonPrefixLengths[dim] = j;
break;
}
}
}
}
MutablePointsReaderUtils.sortByDim(sortedDim, bytesPerDim, commonPrefixLengths,
reader, from, to, scratchBytesRef1, scratchBytesRef2);
int[] docIDs = spareDocIds;
for (int i = from; i < to; ++i) {
docIDs[i - from] = reader.getDocID(i);
}
writeLeafBlockDocs(scratchOut, docIDs, 0, count);
reader.getValue(from, scratchBytesRef1);
System.arraycopy(scratchBytesRef1.bytes, scratchBytesRef1.offset, scratch1, 0, packedBytesLength);
writeCommonPrefixes(scratchOut, commonPrefixLengths, scratch1);
writeLeafBlockPackedValues(scratchOut, commonPrefixLengths, count, sortedDim, packedValues);
writeIndex(out, Math.toIntExact(countPerLeaf), leafBlockFPs, splitPackedValues);
Lucene中常见的数值类型有Int、Long、Float、Double等,针对数值类型进行等值或条件查询时,如果利用Point索引进行过滤,则会通过如下流程获取到满足查询条件的Doc Id集合:
其中Scorer对象包含满足查询条件的Doc Id集合。下面以Int类型为例,结合上图描述的流程,具体介绍查询是如何从Point索引中获取结果集的:
public static Query newExactQuery(String field, int value) {
return newRangeQuery(field, value, value);
}
return new PointRangeQuery(field, pack(lowerValue).bytes, pack(upperValue).bytes, lowerValue.length) {
@Override
protected String toString(int dimension, byte[] value) {
return Integer.toString(decodeDimension(value, 0));
}
};
values.intersect(field, visitor);
DocIdSetIterator iterator = result.build().iterator();
return new ConstantScoreScorer(weight, score(), iterator);
BKDReader bkdReader = getBKDReader(fieldName);
……
bkdReader.intersect(visitor);
Relation r = state.visitor.compare(cellMinPacked, cellMaxPacked);
if (r == Relation.CELL_OUTSIDE_QUERY) {
// 1. 区间不相交,没有符合条件的文档,该子树被优化裁剪掉
} else if (r == Relation.CELL_INSIDE_QUERY) {
// 2. 查询区间包含子树取值区间,所有文档均匹配,加入结果集
addAll(state, false);
} else if (state.index.isLeafNode()) {
// 3. 查询区间和子树取值区间相交,且当前节点是叶子节点,读取叶子节点内容进行过滤
if (state.index.nodeExists()) {
int count = readDocIDs(state.in, state.index.getLeafBlockFP(), state.scratchDocIDs);
visitDocValues(state.commonPrefixLengths, state.scratchPackedValue, state.in, state.scratchDocIDs, count, state.visitor);
}
} else {
// 4. 查询区间和子树取值区间相交,且当前节点是非叶子节点,会进行递归遍历
}
int splitDim = state.index.getSplitDim();
byte[] splitPackedValue = state.index.getSplitPackedValue();
BytesRef splitDimValue = state.index.getSplitDimValue();
// 1. 计算左子树取值区间,递归处理左子树
System.arraycopy(cellMaxPacked, 0, splitPackedValue, 0, packedBytesLength);
System.arraycopy(splitDimValue.bytes, splitDimValue.offset, splitPackedValue, splitDim*bytesPerDim, bytesPerDim);
state.index.pushLeft();
intersect(state, cellMinPacked, splitPackedValue);
state.index.pop();
……
// 2. 计算右子树取值区间,递归处理右子树
System.arraycopy(cellMinPacked, 0, splitPackedValue, 0, packedBytesLength);
System.arraycopy(splitDimValue.bytes, splitDimValue.offset, splitPackedValue, splitDim*bytesPerDim, bytesPerDim);
state.index.pushRight();
intersect(state, splitPackedValue, cellMaxPacked);
state.index.pop();
in.seek(blockFP);
// How many points are stored in this leaf cell:
int count = in.readVInt();
if (version < BKDWriter.VERSION_COMPRESSED_DOC_IDS) {
DocIdsWriter.readInts32(in, count, docIDs);
} else {
DocIdsWriter.readInts(in, count, docIDs);
}
public void visit(int docID, byte[] packedValue) {
for(int dim=0;dim<numDims;dim++) {
int offset = dim*bytesPerDim;
if (StringHelper.compare(bytesPerDim, packedValue, offset, lowerPoint, offset) < 0) {
return; // point value比查询条件左边界lowerPoint小
}
if (StringHelper.compare(bytesPerDim, packedValue, offset, upperPoint, offset) > 0) {
return; // point value比查询条件右边界lowerPoint大
}
}
adder.add(docID);
}
到这里,我们已经完成Point索引读写流程的介绍,这里对读写过程中的一些特殊点做如下说明:
本文主要介绍Point索引的基本概念及其底层存储结构,并结合Point的写入、查询流程进行详细解析。Lucene写入/查询的总体流程、Term索引/行存储/列存储等核心数据结构在本文中暂未提及,后续会有文章作详细介绍。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。