一步一步了解索引存储,inverted index,doc_value,store,_source

原创

周银辉

修改于 2024-07-24 07:40:30

2160

文章被收录于专栏：ElasticSearchElasticSearch

问题一、如何验证：字段索引时，到底存了哪些数据

text类型存储：默认存储到inverted index,_source
非text类型存储：默认存储到inverted index,_source,doc_value

如何验证存储了这些数据呢？我们可以通过一些参数，打开和关闭这些选项，并通过查询数据来验证

以下实例，我们都通过一个text类型和一个keyword类型来验证以上两个内容

1、enabled:false

1、text类型存储：不存储到inverted index，doc_value,存储到_source。

2、keyword类型存储：不存储i到nverted index,doc_value,存储到_source，

我们通过搜索字段内容来验证inverted index,通过脚本访问doc对象来验证doc_store,通过访问params或ctx来验证_source

DELETE message
##1、建立mapping
PUT message
{
  "mappings": {
    "enabled": false,
    "properties": {
      "msg1": {
        "type": "text"
      },
      "msg2": {
        "type": "keyword"
      }
    }
  }
}

GET message/_doc/1
GET message/_mapping
##2、插入多条数据
PUT message/_doc/1
{
  "msg1":"学习好",
  "msg2":"大家好"
}

PUT message/_doc/2
{
  "msg1":"学习好",
  "msg2":"身体好"
}

PUT message/_doc/3
{
  "msg1":"学习好",
  "msg2":"你好"
}

POST message/_refresh
GET message/_doc/1
##3、无法通过索引查询，因为没有建立倒排索引
GET message/_search
{
  "query": {
    "match": {
      "msg1": "学习"
    }
  }
}

GET message/_search
{
  "query": {
    "match": {
      "msg2": "好"
    }
  }
}

##4、无法分组统计，因为不存储doc_value
GET message/_search
{
  "size": 0, 
  "aggs": {
    "msg2_group": {
      "terms": {
        "field": "msg2"
      }
    }
  }
}
##5、无法从doc_value中读取数据，因为不存储doc_value
GET message/_search
{
 "query": {
   "script_score": {
     "script": {
       "lang": "painless",
       "source": "doc['msg1'].value.length()"
     },
     "query": {
       "match_all": {}
     }
   }
 }
}

GET message/_search
{
 "query": {
   "script_score": {
     "script": {
       "lang": "painless",
       "source": "doc['msg2'].value.length()"
     },
     "query": {
       "match_all": {}
     }
   }
 }
}
##6、可以读取数据，这个数据从_source中读取
GET message/_search
{
 "query": {
   "script_score": {
     "script": {
       "lang": "painless",
       "source": "params['_source']['msg1'].length()"
     },
     "query": {
       "match_all": {}
     }
   }
 }
}




GET message/_search
{
 "query": {
   "script_score": {
     "script": {
       "lang": "painless",
       "source": "params['_source']['msg2'].length()"
     },
     "query": {
       "match_all": {}
     }
   }
 }
}

如果我只想对单个字段不存储倒索引可以吗？也是可以的，如下：

PUT message
{
  "mappings": {
    "properties": {
      "msg1": {
        "enabled": false
      },
      "msg2": {
        "enabled": false
      }
    }
  }
}

2、index:false

这个选项的作用是，不建立倒排索引，但是存储doc_value,_source选项,同样通过上面的脚本可以进行测试

PUT message
{
  "mappings": {
    "properties": {
      "msg1": {
        "type": "text"
      },
      "msg2": {
        "type": "keyword",
        "index":false
      }
    }
  }
}

问题2：如果我要实现对text类型分组如何操作呢

# 在text类型启用fielddata:true即可，但是这将导致堆内存增加
PUT message
{
  "mappings": {
    "properties": {
      "msg1": {
        "type":"text",
       "fielddata":true
      },
      "msg2": {
        "enabled": false
      }
    }
  }
}

问题3：还需要store的内容干什么？

已经有了倒排索引，也有正排索引doc_value用于排序和分组，还有原始数据_source,那还需要store的内容干什么？

#其中的一个场景就是，单个文档内容很多，比如有title标题，description描述，还有body内容，
平时只想通过title和description查询和展示，不需要整个_source的内容，如何解决这个问题。
这个时候可以禁用_source,通过单独存储每个字段的内容store属性,通过以下脚本测试
PUT message
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "msg1": {
        "type": "text",
        "store": true
      },
      "msg2": {
        "type":"keyword", 
        "store": true
      }
    }
  }
}


PUT message/_doc/1
{
  "msg1":"学习好",
  "msg2":"大家好"
}

PUT message/_doc/2
{
  "msg1":"学习好",
  "msg2":"身体好"
}

PUT message/_doc/3
{
  "msg1":"学习好",
  "msg2":"你好"
}

POST message/_refresh
GET message/_doc/1

GET message/_search
{
  "query": {
    "match": {
      "msg1": "学习"
    }
  }
}

GET message/_search
{
  "query": {
    "match": {
      "msg2": "好"
    }
  }
}

GET message/_search
{
  "stored_fields": ["msg1","msg2"], 
  "query": {
    
    "match": {
      "msg1": "学习"
    }
  }
}

结论：

1、数据存储在哪里，如何通过实验来验证。

2、每个存储参数选项的作用是要解决什么问题，需要理清楚。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

Elasticsearch Service

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

Elasticsearch Service

登录后参与评论

0 条评论

热度