前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >腾讯云 Elasticsearch Service COS快照恢复常见问题及解决方案

腾讯云 Elasticsearch Service COS快照恢复常见问题及解决方案

原创
作者头像
zjiekou
发布2023-12-06 23:51:14
8760
发布2023-12-06 23:51:14
举报
文章被收录于专栏:大数据zjiekou

一、前言

用户在腾讯云上自建的 ES 集群或者在其它云厂商购买的 ES 集群,如果要迁移至腾讯云 ES(适用于大部分普通索引迁移),用户可以根据自己的业务需要选择合适的迁移方案。如果业务可以停服或者可以暂停写操作,可以使用以下几种方式进行数据迁移:

  1. COS 快照
  2. logstash
  3. elasticsearch-dump 具体实践可参考官网文档:https://cloud.tencent.com/document/product/845/35568

二、常见报错场景

1、创建repository 仓库报错my_cos_backup path is not accessible on master node

代码语言:javascript
复制
{
  "error" : {
    "root_cause" : [
      {
        "type" : "repository_verification_exception",
        "reason" : "[my_cos_backup] path  is not accessible on master node"
      }
    ],
    "type" : "repository_verification_exception",
    "reason" : "[my_cos_backup] path  is not accessible on master node",
    "caused_by" : {
      "type" : "i_o_exception",
      "reason" : "Exception when write blob master.dat",
      "caused_by" : {
        "type" : "cos_service_exception",
        "reason" : "cos_service_exception: The specified bucket does not exist. (Status Code: 404; Error Code: NoSuchBucket; Request ID: NjUzYzkwYmRfMzAxNzUyMWVfMjJmYmNfYTJkOGY1Ng==); Trace ID: OGVmYzZiMmQzYjA2OWNhODk0NTRkMTBiOWVmMDAxODc1NGE1MWY0MzY2NTg1MzM1OTY3MDliYzY2YTQ0ZThhMDFhOWZlZTQxMzRkMTQ2NGM4MmFlZDk1MTQzM2UyMTll"
      }
    }
  },
  "status" : 500
}

报错原因:bucket和app_id参数入参有误,bucket经常会有客户把桶appid完整复制上导致出错、app_id填成UIN也会有这个问题。

解决方案:

bucket:COS Bucket 名字,不带 appId 后缀的 bucket 名

app_id:腾讯云账号 APPID

二、cos快照恢复集群red,"explanation": "node does not match index setting index.routing.allocation.require filters temperature:\"hot\""

报错原因:通常是由于客户将热节点的集群通过cos快照迁移到温节点的集群会出现这个问题,这个意思是说用户在目标集群恢复的时候 因为磁盘属性不同,导致数据恢复失败。 做快照的集群是ssd的,SSD磁盘的数据节点默认是hot属性的, 需要恢复的集群时高性能云硬盘,高性能云盘的数据节点默认是warm属性的 。从hot节点集群恢复到warm节点集群就会出现这个冲突异常。

解决方案:

先删除之前恢复的索引,然后在恢复的命令中,加以下相关参数

代码语言:javascript
复制
POST _snapshot/cos_backup/snapshot_名称/_restore
{
  "indices": "*,-.monitoring*,-.security*,-.kibana*",
  "ignore_unavailable": true,
  "ignore_index_settings": [
    "index.routing.allocation.require.temperature"
  ]
}

三、ES 8.8.1版本创建仓库报错{"error":{"root_cause":[{"type":"repository_exception","reason":"my_cos_repository No region defined for cos repository"}],"type":"repository_exception","reason":"my_cos_repository failed to create repository","caused_by":{"type":"repository_exception","reason":"my_cos_repository No region defined for cos repository"}},"status":500}

代码语言:javascript
复制
curl -u 'elastic:xxxx' -X PUT 'http://xxxxx:9200/_snapshot/my_cos_repository' -H "Content-Type: application/json" -d ' 
{ 
  "type": "cos", 
  "settings": { 
    "bucket": "xxx", 
    "region": "ap-shanghai", 
    "access_key_id": "XXX", 
    "access_key_secret": "XXX", 
    "base_path": "/", 
    "app_id": "xxxx" 
  } 
} 
' 

解决方案:

8.8.1版本需要这样创建

代码语言:javascript
复制
PUT _snapshot/my_cos_backup
{
  "type": "cos",
  "settings": {
    "compress": true,
    "chunk_size": "500mb",
    "cos": {
      "client": {
        "app_id": "xxxx",
        "access_key_id": "xxxx",
        "access_key_secret": "xxxx",
        "bucket": "xxxx",
        "region": "ap-guangzhou",
        "base_path": "/"
      }
    }
  }
}

四、快照恢复报错

代码语言:javascript
复制
{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "Alias [.kibana] has more than one indices associated with it [[.kibana_2_backup, .kibana_1]], can't execute a single index op: [illegal_argument_exception] Alias [.kibana] has more than one indices associated with it [[.kibana_2_backup, .kibana_1]], can't execute a single index op"
}

解决方案:

恢复ES备份的时候,把源集群的.kibana_1和.kibana_2也复制过来了,这个.kibana_2的别名是.kibana。导致冲突了。

首先移除.kibana_2别名

代码语言:javascript
复制
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": ".kibana_2",
        "alias": ".kibana"
      }
    }
  ]
}

关闭索引自动创建

代码语言:javascript
复制
PUT _cluster/settings
{
  "persistent": {
    "action.auto_create_index": false
  }
}

操作完成后重新进行恢复数据操作

多可用区集群增量恢复报错

代码语言:javascript
复制
{"unassigned_info":{"reason":"EXISTING_INDEX_RESTORED","details":"restore_source[my_cos_backup/snapshot_2]"},"node_allocation_decisions":[{"deciders":[{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]},{"deciders":[{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]},{"deciders":[{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]},{"deciders":[{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]},{"deciders":[{"explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[sel_pitem_his][2], node[Tg4tV6mcT22SO_0ZCfWHWA], [R], s[STARTED], a[id=X7H-v4fRQTaWnOPsu3j7KA]]"},{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]},{"deciders":[{"explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[sel_pitem_his][2], node[b2RSlGbNR82IU_1j1HShJw], [P], s[STARTED], a[id=UXIVucaDQkyNSQ5syzrkwA]]"},{"explanation":"there are too many copies of the shard allocated to nodes with attribute [set], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"}]}]}

解决方案:

方案一

结合集群环境,计算副本数目

不需要调整的副本数目需要用以下公式计算:

ceil{(replicas + 1) / (可用区+1)} = ceil{(replicas + 1) / 可用区}

repilcas <= 总节点数 -1

以2可用区,单可用区3节点为例

replicas可用值为:0,1,3

然后获取集群所有索引分片数目,如果和上述结果不一致,则挂起

方案二

换一个思路,分片迁移出现异常是由于awareness.attributes 中set属性限制导致,流程执行到checkScaleInCvmCluster后,执行如下命令:

代码语言:javascript
复制
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "ip"
  },
  "transient": {
    "cluster.routing.allocation.awareness.attributes": "ip"
  }
}

执行完上述命令后,卡住的分片随后会自动迁移,并且老的可用区节点自动下线, "cluster.routing.allocation.awareness.attributes": "ip"会自动还原为 "cluster.routing.allocation.awareness.attributes" : "set,ip"

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、前言
  • 二、常见报错场景
    • 1、创建repository 仓库报错my_cos_backup path is not accessible on master node
      • 报错原因:bucket和app_id参数入参有误,bucket经常会有客户把桶appid完整复制上导致出错、app_id填成UIN也会有这个问题。
        • 解决方案:
          • 二、cos快照恢复集群red,"explanation": "node does not match index setting index.routing.allocation.require filters temperature:\"hot\""
            • 报错原因:通常是由于客户将热节点的集群通过cos快照迁移到温节点的集群会出现这个问题,这个意思是说用户在目标集群恢复的时候 因为磁盘属性不同,导致数据恢复失败。 做快照的集群是ssd的,SSD磁盘的数据节点默认是hot属性的, 需要恢复的集群时高性能云硬盘,高性能云盘的数据节点默认是warm属性的 。从hot节点集群恢复到warm节点集群就会出现这个冲突异常。
              • 解决方案:
                • 三、ES 8.8.1版本创建仓库报错{"error":{"root_cause":[{"type":"repository_exception","reason":"my_cos_repository No region defined for cos repository"}],"type":"repository_exception","reason":"my_cos_repository failed to create repository","caused_by":{"type":"repository_exception","reason":"my_cos_repository No region defined for cos repository"}},"status":500}
                  • 解决方案:
                    • 四、快照恢复报错
                      • 解决方案:
                      • 多可用区集群增量恢复报错
                      • 解决方案:
                  相关产品与服务
                  对象存储
                  对象存储(Cloud Object Storage,COS)是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限,无需分区管理,适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。
                  领券
                  问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档