MongoDB5.0开始引入的resharding功能

原创

保持热爱奔赴山海

修改于 2024-10-03 21:47:00

1350

修改于 2024-10-03 21:47:00

文章被收录于专栏：数据库相关

在MongoDB5.0之前，如果要重新分片的话，操作比较复杂。一般是部署一个新的集群，然后按照新的分片算法将数据全量+增量同步到新的集群里。

从MongoDB5.0开始，重分片可以使用reshardCollection命令来完成。但是需要注意有几个先决条件：

磁盘空间：确保至少有 1.2 倍于您要重新分片的集合的大小。如果集合为 1TB，则您的磁盘中至少需要 1.2TB 的可用空间。
I/O 容量应低于 50%
CPU 负载应低于 80%
另一个重要要求是，您需要将更改部署到应用程序查询中。要使数据库在分片过程中发挥最佳作用，查询必须对当前分片键和新分片键使用筛选条件。只有在重新分片过程结束时，您才能从查询中删除有关旧分片键的所有筛选条件。【研发侧】
您的应用程序可以允许受影响的集合块进行两秒钟的写入。在写入受阻期间，应用程序的延迟会增加。【研发侧】
您必须重写应用程序的查询，才能同时使用当前分片键和新分片键。【研发侧配合改造】

还需要考虑其他限制：

如果构建的索引正在运行，则不允许重新共享
如果您没有同时包含当前和新的分片键，某些查询将返回错误：deleteOne（）， findAnd Modify（）， updateOne（）…查看手册以获取完整列表
您一次只能重新分片一个集合
您不能使用addShard（）， removeShard（）， dropDatabase（）， db.createCollection（）当重新分片正在运行时
新的分片键不能具有唯一性约束
不支持对具有唯一性约束的集合进行重新分片
如果 _id 值不是全局唯一的，则重新分片操作会失败

在重新分片操作期间使用上述任何命令都会导致重新分片操作失败。

实验

查看当前分片数据分布情况

假设 newusers是已经根据name进行hash分片的集合，具体的数据和分片情况如下：

[direct: mongos] user_center> db.newusers.find()
[
  { _id: ObjectId('66fb732f0666502468838740'), name: 'li si' },
  { _id: ObjectId('66fb73330666502468838741'), name: 'wang wu' },
  { _id: ObjectId('66fb79ab0666502468838743'), name: 'li si' },
  { _id: ObjectId('66fb79ab0666502468838744'), name: 'wang wu' },
  { _id: ObjectId('66fb79ac0666502468838746'), name: 'li si' },
  { _id: ObjectId('66fb79ac0666502468838747'), name: 'wang wu' },
  { _id: ObjectId('66fe883d1ae87558c2838726'), name: 'zhangmazi' },
  { _id: ObjectId('66fb7326066650246883873f'), name: 'zhangsan' },
  { _id: ObjectId('66fb79ab0666502468838742'), name: 'zhangsan' },
  { _id: ObjectId('66fb79ac0666502468838745'), name: 'zhangsan' }
]

[direct: mongos] user_center> sh.status()
... 其余内容忽略 ...
  {
    database: {
      _id: 'user_center',
      primary: 'shard01',
      partitioned: false,
      version: {
        uuid: UUID('e440a9b2-97f1-462f-b35a-af48a1fae2d8'),
        timestamp: Timestamp({ t: 1727754530, i: 1 }),
        lastMod: 1
      }
    },
    collections: {
      'user_center.newusers': {
        shardKey: { name: 'hashed' },  -- 注意这里是hashed
        unique: false,
        balancing: true,
        chunkMetadata: [
          { shard: 'shard01', nChunks: 1 },
          { shard: 'shard02', nChunks: 1 }
        ],
        chunks: [
          { min: { name: MinKey() }, max: { name: Long('0') }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 1, i: 5 }) },
          { min: { name: Long('0') }, max: { name: MaxKey() }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 1, i: 4 }) }
        ],
        tags: []
      }
    }
  }
]

[direct: mongos] user_center> db.newusers.getShardDistribution()
Shard shard01 at shard01/localhost:27018
{
  data: '243B',
  docs: 6,
  chunks: 1,
  'estimated data per chunk': '243B',
  'estimated docs per chunk': 6
}
---
Shard shard02 at shard02/localhost:27019
{
  data: '114B',
  docs: 3,  -- shard2有3条数据
  chunks: 1,
  'estimated data per chunk': '114B',
  'estimated docs per chunk': 3
}
---
Totals
{
  data: '357B',
  docs: 9,  -- shard1有9条数据

  chunks: 2,
  'Shard shard01': [
    '68.06 % data',
    '66.66 % docs in cluster',
    '40B avg obj size on shard'
  ],
  'Shard shard02': [
    '31.93 % data',
    '33.33 % docs in cluster',
    '38B avg obj size on shard'
  ]
}

执行reshardCollection命令

将newusers的数据按照name进行按范围分片

[direct: mongos] user_center> db.adminCommand({reshardCollection: "user_center.newusers",key: { name: 1 }})
这里命令会卡住，直到resharding的操作完成后，控制台输出日志如下：
{
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1727957043, i: 55 }),
    signature: {
      hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
      keyId: Long('0')
    }
  },
  operationTime: Timestamp({ t: 1727957043, i: 55 })
}

在reshardCollection命令在运行的期间，可以另开窗口连接到mongos上使用下面的命令评估reshard的进度和耗时：

[direct: mongos] user_center> db.getSiblingDB("admin").aggregate([ { $currentOp: { allUsers: true, localOps: false } }, { $match: { type: "op", "originatingCommand.reshardCollection": "user_center.newusers" } }])
[
  {
    shard: 'shard01',
    totalCopyTimeElapsedSecs: Long('287'),
    totalApplyTimeElapsedSecs: Long('0'),
    totalCriticalSectionTimeElapsedSecs: Long('0'),
    oplogEntriesFetched: Long('572'),
    oplogEntriesApplied: Long('0'),
    insertsApplied: Long('0'),
    updatesApplied: Long('0'),
    deletesApplied: Long('0'),
    type: 'op',
    desc: 'ReshardingMetricsRecipientService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
    op: 'command',
    ns: 'user_center.newusers',
    originatingCommand: {
      reshardCollection: 'user_center.newusers',
      key: { name: 1 },
      unique: 'false',
      collation: { locale: 'simple' }
    },
    totalOperationTimeElapsedSecs: Long('287'),
    recipientState: 'cloning',
    remainingOperationTimeEstimatedSecs: Long('543'),
    approxDocumentsToCopy: Long('4'),
    approxBytesToCopy: Long('178'),
    bytesCopied: Long('123'),
    countWritesToStashCollections: Long('0'),
    documentsCopied: Long('3')
  },
  {
    shard: 'shard01',
    totalCopyTimeElapsedSecs: Long('0'),
    totalApplyTimeElapsedSecs: Long('0'),
    totalCriticalSectionTimeElapsedSecs: Long('0'),
    type: 'op',
    desc: 'ReshardingMetricsDonorService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
    op: 'command',
    ns: 'user_center.newusers',
    originatingCommand: {
      reshardCollection: 'user_center.newusers',
      key: { name: 1 },
      unique: 'false',
      collation: { locale: 'simple' }
    },
    totalOperationTimeElapsedSecs: Long('287'),
    donorState: 'donating-initial-data',
    countWritesDuringCriticalSection: Long('0'),
    countReadsDuringCriticalSection: Long('0')
  },
  {
    shard: 'shard02',
    totalCopyTimeElapsedSecs: Long('287'),
    totalApplyTimeElapsedSecs: Long('0'),
    totalCriticalSectionTimeElapsedSecs: Long('0'),
    oplogEntriesFetched: Long('574'),
    oplogEntriesApplied: Long('0'),
    insertsApplied: Long('0'),
    updatesApplied: Long('0'),
    deletesApplied: Long('0'),
    type: 'op',
    desc: 'ReshardingMetricsRecipientService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
    op: 'command',
    ns: 'user_center.newusers',
    originatingCommand: {
      reshardCollection: 'user_center.newusers',
      key: { name: 1 },
      unique: 'false',
      collation: { locale: 'simple' }
    },
    totalOperationTimeElapsedSecs: Long('287'),
    recipientState: 'cloning',
    remainingOperationTimeEstimatedSecs: Long('149'),
    approxDocumentsToCopy: Long('4'),
    approxBytesToCopy: Long('178'),
    bytesCopied: Long('234'),
    countWritesToStashCollections: Long('0'),
    documentsCopied: Long('6')
  },
  {
    shard: 'shard02',
    totalCopyTimeElapsedSecs: Long('0'),
    totalApplyTimeElapsedSecs: Long('0'),
    totalCriticalSectionTimeElapsedSecs: Long('0'),
    type: 'op',
    desc: 'ReshardingMetricsDonorService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
    op: 'command',
    ns: 'user_center.newusers',
    originatingCommand: {
      reshardCollection: 'user_center.newusers',
      key: { name: 1 },
      unique: 'false',
      collation: { locale: 'simple' }
    },
    totalOperationTimeElapsedSecs: Long('287'),
    donorState: 'donating-initial-data',
    countWritesDuringCriticalSection: Long('0'),
    countReadsDuringCriticalSection: Long('0')
  }
]

注意上面返回内容中的：

totalOperationTimeElapsedSecs

remainingOperationTimeEstimatedSecs

此外，在resharding期间，我们可以在mongos上尝试插入一条数据，可以发现插入数据是完全没有问题的。

resharding的这个步骤耗时比较长，在本机通过mlaunch环境搭建的测试的时候，发现仅10条document的resharding的耗时就挺久了，下面是操作期间mongos的日志，可以看到用了接近5分钟（可能我这实验的机器也比较垃圾）。

{
    "t": {
        "$date": "2024-10-03T20:04:03.404+08:00"
    },
    "s": "I",
    "c": "RESHARD",
    "id": 7763800,
    "ctx": "ReshardingCoordinatorService-6",
    "msg": "Resharding complete",
    "attr": {
        "info": {
            "uuid": {
                "uuid": {
                    "$uuid": "125605c4-1c6e-43e3-beeb-e98e9fd81466"
                }
            },
            "status": "success",
            "statistics": {
                "ns": "user_center.newusers",
                "sourceUUID": {
                    "uuid": {
                        "$uuid": "619f86f5-451b-4135-bf2c-fd0f43b1cefc"
                    }
                },
                "newUUID": {
                    "uuid": {
                        "$uuid": "125605c4-1c6e-43e3-beeb-e98e9fd81466"
                    }
                },
                "newShardKey": {
                    "name": 1
                },
                "startTime": {
                    "$date": "2024-10-03T11:59:01.321Z"
                },
                "endTime": {
                    "$date": "2024-10-03T12:04:03.403Z"
                },
                "totalCopyTimeElapsedSecs": 300,
                "totalApplyTimeElapsedSecs": 0,
                "totalCriticalSectionTimeElapsedSecs": 1
            }
        }
    }
}

再次查看数据的分布情况

[direct: mongos] user_center> sh.status()

  {
    database: {
      _id: 'user_center',
      primary: 'shard01',
      partitioned: false,
      version: {
        uuid: UUID('e440a9b2-97f1-462f-b35a-af48a1fae2d8'),
        timestamp: Timestamp({ t: 1727754530, i: 1 }),
        lastMod: 1
      }
    },
    collections: {
      'user_center.newusers': {
        shardKey: { name: 1 },   注意看这里
        unique: false,
        balancing: true,
        chunkMetadata: [
          { shard: 'shard01', nChunks: 1 },
          { shard: 'shard02', nChunks: 1 }
        ],
        chunks: [
          { min: { name: MinKey() }, max: { name: 'zhangsan' }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 1, i: 0 }) },
          { min: { name: 'zhangsan' }, max: { name: MaxKey() }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 1, i: 5 }) }
        ],
        tags: []
      }
    }
  }
]

[direct: mongos] user_center> db.newusers.getShardDistribution()
Shard shard02 at shard02/localhost:27019
{
  data: '276B',
  docs: 7,  -- shard2有7条数据
  chunks: 1,
  'estimated data per chunk': '276B',
  'estimated docs per chunk': 7
}
---
Shard shard01 at shard01/localhost:27018
{
  data: '123B',
  docs: 3,  -- shard1有3条数据
  chunks: 1,
  'estimated data per chunk': '123B',
  'estimated docs per chunk': 3
}
---
Totals
{
  data: '399B',
  docs: 10,
  chunks: 2,
  'Shard shard02': [
    '69.17 % data',
    '70 % docs in cluster',
    '39B avg obj size on shard'
  ],
  'Shard shard01': [
    '30.82 % data',
    '30 % docs in cluster',
    '41B avg obj size on shard'
  ]
}

迁移完成后，再次查看shard1和shard2上的数据情况：