Elasticsearch是目前大数据领域最热门的技术栈之一,腾讯云 Elasticsearch Service(ES)是基于开源搜索引擎 Elasticsearch 打造的高可用、可伸缩的云端全托管 Elasticsearch 服务,完善的高可用解决方案,让业务可以放心的把重要数据存储到腾讯云 ES 中。
了解 ES 的索引管理方法有助于扬长避短,更好的利用 ES 的强大功能,特别是当遇到性能问题时,原因通常都可回溯至数据的索引方式以及集群中的分片数量。如果未能在一开始做出最佳选择,随着数据量越来越大,便有可能会引发性能问题。集群中的数据越多,要纠正这一问题就越难,本文旨在帮助大家了解 ES 容量管理的方法,在一开始就管理好索引的容量,避免给后面留坑。
本文介绍 3种管理索引容量的方法,从这3种方法可以了解到 ES 管理索引容量的演进过程:
索引名上带日期的写法:
<static_name{date_math_expr{date_format|time_zone}}>
参考官方文档:Date math support in index names
其中日期格式就是 java 的日期格式:
yyyy:年
MM:月
dd:日
hh:1~12小时制(1-12)
HH:24小时制(0-23)
mm:分
ss:秒
S:毫秒
E:星期几
D:一年中的第几天
F:一月中的第几个星期(会把这个月总共过的天数除以7)
w:一年中的第几个星期
W:一月中的第几星期(会根据实际情况来算)
a:上下午标识
k:和HH差不多,表示一天24小时制(1-24)
K:和hh差不多,表示一天12小时制(0-11)
z:表示时区
例如:
<logs-{now{yyyyMMddHH|+08:00}}-000001>
在使用的时候,索引名要 urlencode 后再使用
PUT /%3Cmylogs-%7Bnow%7ByyyyMMddHH%7C%2B08%3A00%7D%7D-000001%3E
{
"aliases": {
"mylogs-read-alias": {}
}
}
执行结果:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "mylogs-2020061518-000001"
}
写入数据的时候也要带上日期
POST /%3Cmylogs-%7Bnow%7ByyyyMMddHH%7C%2B08%3A00%7D%7D-000001%3E/_doc
{"name":"xxx"}
执行结果:
{
"_index" : "mylogs-2020061518-000001",
"_type" : "_doc",
"_id" : "VNZut3IBgpLCCHbxDzDB",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
由于数据分布在多个索引里,查询的时候要在符合条件的所有索引查询,可以使用下面的方法查询
GET /mylogs-2020061518-000001,mylogs-2020061519-000001/_search
{"query":{"match_all":{}}}
GET /mylogs-*/_search
{
"query": {
"match_all": {}
}
}
执行结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "mylogs-2020061518-000001",
"_type" : "_doc",
"_id" : "VNZut3IBgpLCCHbxDzDB",
"_score" : 1.0,
"_source" : {
"name" : "xxx"
}
}
]
}
}
GET /mylogs-read-alias/_search
{
"query": {
"match_all": {}
}
}
执行结果同上
这个方法的优点是比较直观能够通过索引名称直接分辨出数据的新旧,缺点是:
Rollover 的原理是使用一个别名指向真正的索引,当指向的索引满足一定条件(文档数或时间或索引大小)更新实际指向的索引。
注意: 索引名称的格式为 {.*}-d 这种格式的,数字默认是 6位
PUT myro-000001
{
"aliases": {
"myro_write_alias":{}
}
}
使用 bulk 一次写入了 3条记录
POST /myro_write_alias/_bulk?refresh=true
{"create":{}}
{"name":"xxx"}
{"create":{}}
{"name":"xxx"}
{"create":{}}
{"name":"xxx"}
执行结果:
{
"took" : 37,
"errors" : false,
"items" : [
{
"create" : {
"_index" : "myro-000001",
"_type" : "_doc",
"_id" : "wVvFtnIBUTVfQxRWwXyM",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myro-000001",
"_type" : "_doc",
"_id" : "wlvFtnIBUTVfQxRWwXyM",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myro-000001",
"_type" : "_doc",
"_id" : "w1vFtnIBUTVfQxRWwXyM",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 201
}
}
]
}
记录都写到了 myro-000001 索引下
rollover 的3个条件是并列关系,任意一个条件满足就会发生 rollover
POST /myro_write_alias/_rollover
{
"conditions": {
"max_age": "7d",
"max_docs": 3,
"max_size": "5gb"
}
}
执行结果:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"old_index" : "myro-000001",
"new_index" : "myro-000002",
"rolled_over" : true,
"dry_run" : false,
"conditions" : {
"[max_docs: 3]" : true,
"[max_size: 5gb]" : false,
"[max_age: 7d]" : false
}
}
分析一下执行结果:
"new_index" : "myro-000002"
"[max_docs: 3]" : true,
从结果看出满足了条件("max_docs: 3" : true)发生了 rollover,新的索引指向了 myro-000002
再写入一条记录:
POST /myro_write_alias/_doc
{"name":"xxx"}
已经写入了新的索引,结果符合预期
{
"_index" : "myro-000002",
"_type" : "_doc",
"_id" : "BdbMtnIBgpLCCHbxhihi",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
ES 一直在索引管理这块进行优化迭代,从6.7版本推出了索引生命周期管理(Index Lifecycle Management ,简称ILM)机制,是目前官方提供的比较完善的索引管理方法。所谓 Lifecycle(生命周期)是把索引定义了四个阶段:
这 4个阶段是 ES 定义的一个索引从生到死的过程, Hot -> Warm -> Cold -> Delete 4个阶段只有 Hot 阶段是必须的,其他3个阶段根据业务的需求可选。
使用方法通常是下面几个步骤:
这一步通常在 Kibana 上操作,需要的时候再导出 ES 语句
例如下面这个策略
导出的语句如下
PUT _ilm/policy/myes-lifecycle
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "30d",
"max_size": "50gb",
"max_docs": 2
},
"set_priority": {
"priority": 100
}
}
}
}
}
}
ES 语句如下:
PUT /_template/myes_template
{
"index_patterns": [
"myes-*"
],
"aliases": {
"myes_reade_alias": {}
},
"settings": {
"index": {
"lifecycle": {
"name": "myes-lifecycle",
"rollover_alias": "myes_write_alias"
},
"refresh_interval": "30s",
"number_of_shards": "12",
"number_of_replicas": "1"
}
},
"mappings": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
⚠注意:
ES 语句:
PUT /myes-testindex-000001
{
"aliases": {
"myes_write_alias":{}
}
}
⚠注意:
GET /myes-testindex-000001
{}
执行结果:
{
"myes-testindex-000001" : {
"aliases" : {
"myes_reade_alias" : { },
"myes_write_alias" : { }
},
"mappings" : {
"dynamic_templates" : [
{
"message_full" : {
"match" : "message_full",
"mapping" : {
"fields" : {
"keyword" : {
"ignore_above" : 2048,
"type" : "keyword"
}
},
"type" : "text"
}
}
},
{
"message" : {
"match" : "message",
"mapping" : {
"type" : "text"
}
}
},
{
"strings" : {
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
}
],
"properties" : {
"name" : {
"type" : "keyword"
}
}
},
"settings" : {
"index" : {
"lifecycle" : {
"name" : "myes-lifecycle",
"rollover_alias" : "myes_write_alias"
},
"refresh_interval" : "30s",
"number_of_shards" : "12",
"translog" : {
"sync_interval" : "5s",
"durability" : "async"
},
"provided_name" : "myes-testindex-000001",
"max_result_window" : "65536",
"creation_date" : "1592222799955",
"unassigned" : {
"node_left" : {
"delayed_timeout" : "5m"
}
},
"priority" : "100",
"number_of_replicas" : "1",
"uuid" : "tPwDbkuvRjKtRHiL4fKcPA",
"version" : {
"created" : "7050199"
}
}
}
}
}
⚠注意:
POST /myes_write_alias/_bulk?refresh=true
{"create":{}}
{"name":"xxx"}
{"create":{}}
{"name":"xxx"}
{"create":{}}
{"name":"xxx"}
执行结果:
{
"took" : 18,
"errors" : false,
"items" : [
{
"create" : {
"_index" : "myes-testindex-000001",
"_type" : "_doc",
"_id" : "jF3it3IBUTVfQxRW1Xys",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myes-testindex-000001",
"_type" : "_doc",
"_id" : "jV3it3IBUTVfQxRW1Xys",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myes-testindex-000001",
"_type" : "_doc",
"_id" : "jl3it3IBUTVfQxRW1Xys",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
]
}
⚠注意:
再次执行上面的语句,写入 3条记录发现新的数据都写到了 myes-testindex-000002 中, 结果符合预期。
⚠注意:
如果按照这个步骤没有发生自动 rollover 数据仍然写到了 myes-testindex-000001 中,需要 配置 Lifecycle 自动 Rollover的时间间隔, 参考下文
{
"took" : 17,
"errors" : false,
"items" : [
{
"create" : {
"_index" : "myes-testindex-000002",
"_type" : "_doc",
"_id" : "yl0JuHIBUTVfQxRWvsv5",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myes-testindex-000002",
"_type" : "_doc",
"_id" : "y10JuHIBUTVfQxRWvsv5",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "myes-testindex-000002",
"_type" : "_doc",
"_id" : "zF0JuHIBUTVfQxRWvsv5",
"_version" : 1,
"result" : "created",
"forced_refresh" : true,
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
]
}
修改 Lifecycle 配置:
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": "3s"
}
}
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。