ElasticSearch中的index_prefixes前缀匹配

原创

保持热爱奔赴山海

修改于 2025-06-08 09:18:10

790

文章被收录于专栏：数据库相关数据库相关

在 Elasticsearch 中，有时候需要做前缀模糊搜索是一种近似匹配的搜索方式。

prefix查询也就是前缀查询，查询指定field字段包含特定前缀的文档。

如下例子：

PUT test-index


PUT test-index/_doc/1
{
  "full_name": "wangwu"
}

PUT test-index/_doc/22
{
  "full_name": "li"
}

PUT test-index/_doc/111
{
  "full_name": "wusun"
}

PUT test-index/_doc/1111
{
  "full_name": "a"
}

GET test-index/_search
{
  "query": {
    "prefix": {
      "full_name":"wus"
      }
    }
}
结果如下：
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-index",
        "_type" : "_doc",
        "_id" : "111",
        "_score" : 1.0,
        "_source" : {
          "full_name" : "wusun"
        }
      }
    ]
  }
}

前缀查询的过程：

1、扫描倒排索引并查询第一个以wus开头的词

2、收集相关联的文档的id

3、继续移动到下一条倒排索引

4、如果这个词还是以wus开头，查询则回到step2重复执行

如果index里的doc比较少，上述这种方式还是没啥问题的。但是如果doc很多，这个前缀查询则可能会比较慢了。

为此， ES在高版本中引入了index_prefixes能力，它本质上就是空间换时间（提前把需要关注的field的前缀数据存起来）。

index_prefixes的相关基础：

index_prefixes参数允许对词条前缀进行索引，以加速前缀搜索。它接受以下可选设置:

min_chars：索引的最小前缀长度（包含），必须大于0，默认值为2。

max_chars：索引的最大前缀长度（包含），必须小于20，默认值为5。

index_prefixe可以理解为在索引上又建了层索引，会为词项再创建倒排索引，会加快前缀搜索的时间，但是会浪费大量空间，本质还是空间换时间。

具体看如下的例子：

# 创建索引，这里我们把默认的min_chars改为3
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "full_name": {
        "type": "text",
        "index_prefixes": {
          "min_chars" : 3,
          "max_chars" : 10
        }
      }
    }
  }
}

上面mapping中的index_prefixes 参数，指示 Elasticsearch 创建一个子字段"._index_prefix"。该字段将用于执行快速前缀查询。
在进行高亮显示时，将"._index_prefix"子字段添加到 matched_fields 参数中，
以便根据前缀字段找到的匹配项高亮显示主字段。



PUT my-index-000001/_doc/1
{
  "full_name": "wangwu"
}

PUT my-index-000001/_doc/22
{
  "full_name": "li"
}

PUT my-index-000001/_doc/111
{
  "full_name": "wusun"
}

PUT my-index-000001/_doc/1111
{
  "full_name": "a"
}




GET my-index-000001/_search
{
  "query": {
    "prefix": {
      "full_name": {
        "value": "wus"
      }
    }
  },
  "highlight": {
    "fields": {
      "full_name": {
        "matched_fields": ["full_name._index_prefix"]
      }
    }
  }
}

查询结果如下：
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "111",
        "_score" : 1.0,
        "_source" : {
          "full_name" : "wusun"
        }
      }
    ]
  }
}


# 试下非前缀的查询
GET my-index-000001/_search
{
  "query": {
    "prefix": {
      "full_name": {
        "value": "sun"
      }
    }
  },
  "highlight": {
    "fields": {
      "full_name": {
        "matched_fields": ["full_name._index_prefix"]
      }
    }
  }
}

返回结果如下，为空：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}




# 再试下查询前面2个字符的情况，可以看到报错了
GET my-index-000001/_search
{
  "query": {
    "prefix": {
      "full_name": {
        "value": "wu"
      }
    }
  },
  "highlight": {
    "fields": {
      "full_name": {
        "matched_fields": ["full_name._index_prefix"]
      }
    }
  }
}

这个查询报错了，如下：
{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Cannot invoke \"Object.hashCode()\" because \"this.rewriteMethod\" is null",
        "index_uuid" : "CwAqAEaHRRqH1_92Egyrlw",
        "index" : "my-index-000001"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "my-index-000001",
        "node" : "CMjzwiULR0GBf50Wn7-FiQ",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Cannot invoke \"Object.hashCode()\" because \"this.rewriteMethod\" is null",
          "index_uuid" : "CwAqAEaHRRqH1_92Egyrlw",
          "index" : "my-index-000001",
          "caused_by" : {
            "type" : "null_pointer_exception",
            "reason" : "Cannot invoke \"Object.hashCode()\" because \"this.rewriteMethod\" is null"
          }
        }
      }
    ]
  },
  "status" : 400
}

可以看到，入参的字符串长度太短，低于我们在mapping中定义的min_chars是无法使用这个index_prefixes查询的。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

elasticsearch

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

elasticsearch

登录后参与评论

0 条评论

热度

ElasticSearch中的index_prefixes前缀匹配

ElasticSearch中的index_prefixes前缀匹配

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐