在MongoDB5.0之前,如果要重新分片的话,操作比较复杂。一般是部署一个新的集群,然后按照新的分片算法将数据全量+增量同步到新的集群里。
从MongoDB5.0开始,重分片可以使用reshardCollection命令来完成。但是需要注意有几个先决条件:
还需要考虑其他限制:
在重新分片操作期间使用上述任何命令都会导致重新分片操作失败。
假设 newusers是已经根据name进行hash分片的集合,具体的数据和分片情况如下:
[direct: mongos] user_center> db.newusers.find()
[
{ _id: ObjectId('66fb732f0666502468838740'), name: 'li si' },
{ _id: ObjectId('66fb73330666502468838741'), name: 'wang wu' },
{ _id: ObjectId('66fb79ab0666502468838743'), name: 'li si' },
{ _id: ObjectId('66fb79ab0666502468838744'), name: 'wang wu' },
{ _id: ObjectId('66fb79ac0666502468838746'), name: 'li si' },
{ _id: ObjectId('66fb79ac0666502468838747'), name: 'wang wu' },
{ _id: ObjectId('66fe883d1ae87558c2838726'), name: 'zhangmazi' },
{ _id: ObjectId('66fb7326066650246883873f'), name: 'zhangsan' },
{ _id: ObjectId('66fb79ab0666502468838742'), name: 'zhangsan' },
{ _id: ObjectId('66fb79ac0666502468838745'), name: 'zhangsan' }
]
[direct: mongos] user_center> sh.status()
... 其余内容忽略 ...
{
database: {
_id: 'user_center',
primary: 'shard01',
partitioned: false,
version: {
uuid: UUID('e440a9b2-97f1-462f-b35a-af48a1fae2d8'),
timestamp: Timestamp({ t: 1727754530, i: 1 }),
lastMod: 1
}
},
collections: {
'user_center.newusers': {
shardKey: { name: 'hashed' }, -- 注意这里是hashed
unique: false,
balancing: true,
chunkMetadata: [
{ shard: 'shard01', nChunks: 1 },
{ shard: 'shard02', nChunks: 1 }
],
chunks: [
{ min: { name: MinKey() }, max: { name: Long('0') }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 1, i: 5 }) },
{ min: { name: Long('0') }, max: { name: MaxKey() }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 1, i: 4 }) }
],
tags: []
}
}
}
]
[direct: mongos] user_center> db.newusers.getShardDistribution()
Shard shard01 at shard01/localhost:27018
{
data: '243B',
docs: 6,
chunks: 1,
'estimated data per chunk': '243B',
'estimated docs per chunk': 6
}
---
Shard shard02 at shard02/localhost:27019
{
data: '114B',
docs: 3, -- shard2有3条数据
chunks: 1,
'estimated data per chunk': '114B',
'estimated docs per chunk': 3
}
---
Totals
{
data: '357B',
docs: 9, -- shard1有9条数据
chunks: 2,
'Shard shard01': [
'68.06 % data',
'66.66 % docs in cluster',
'40B avg obj size on shard'
],
'Shard shard02': [
'31.93 % data',
'33.33 % docs in cluster',
'38B avg obj size on shard'
]
}
将newusers的数据按照name进行按范围分片
[direct: mongos] user_center> db.adminCommand({reshardCollection: "user_center.newusers",key: { name: 1 }})
这里命令会卡住,直到resharding的操作完成后,控制台输出日志如下:
{
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1727957043, i: 55 }),
signature: {
hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
keyId: Long('0')
}
},
operationTime: Timestamp({ t: 1727957043, i: 55 })
}
在reshardCollection命令在运行的期间,可以另开窗口连接到mongos上使用下面的命令评估reshard的进度和耗时:
[direct: mongos] user_center> db.getSiblingDB("admin").aggregate([ { $currentOp: { allUsers: true, localOps: false } }, { $match: { type: "op", "originatingCommand.reshardCollection": "user_center.newusers" } }])
[
{
shard: 'shard01',
totalCopyTimeElapsedSecs: Long('287'),
totalApplyTimeElapsedSecs: Long('0'),
totalCriticalSectionTimeElapsedSecs: Long('0'),
oplogEntriesFetched: Long('572'),
oplogEntriesApplied: Long('0'),
insertsApplied: Long('0'),
updatesApplied: Long('0'),
deletesApplied: Long('0'),
type: 'op',
desc: 'ReshardingMetricsRecipientService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
op: 'command',
ns: 'user_center.newusers',
originatingCommand: {
reshardCollection: 'user_center.newusers',
key: { name: 1 },
unique: 'false',
collation: { locale: 'simple' }
},
totalOperationTimeElapsedSecs: Long('287'),
recipientState: 'cloning',
remainingOperationTimeEstimatedSecs: Long('543'),
approxDocumentsToCopy: Long('4'),
approxBytesToCopy: Long('178'),
bytesCopied: Long('123'),
countWritesToStashCollections: Long('0'),
documentsCopied: Long('3')
},
{
shard: 'shard01',
totalCopyTimeElapsedSecs: Long('0'),
totalApplyTimeElapsedSecs: Long('0'),
totalCriticalSectionTimeElapsedSecs: Long('0'),
type: 'op',
desc: 'ReshardingMetricsDonorService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
op: 'command',
ns: 'user_center.newusers',
originatingCommand: {
reshardCollection: 'user_center.newusers',
key: { name: 1 },
unique: 'false',
collation: { locale: 'simple' }
},
totalOperationTimeElapsedSecs: Long('287'),
donorState: 'donating-initial-data',
countWritesDuringCriticalSection: Long('0'),
countReadsDuringCriticalSection: Long('0')
},
{
shard: 'shard02',
totalCopyTimeElapsedSecs: Long('287'),
totalApplyTimeElapsedSecs: Long('0'),
totalCriticalSectionTimeElapsedSecs: Long('0'),
oplogEntriesFetched: Long('574'),
oplogEntriesApplied: Long('0'),
insertsApplied: Long('0'),
updatesApplied: Long('0'),
deletesApplied: Long('0'),
type: 'op',
desc: 'ReshardingMetricsRecipientService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
op: 'command',
ns: 'user_center.newusers',
originatingCommand: {
reshardCollection: 'user_center.newusers',
key: { name: 1 },
unique: 'false',
collation: { locale: 'simple' }
},
totalOperationTimeElapsedSecs: Long('287'),
recipientState: 'cloning',
remainingOperationTimeEstimatedSecs: Long('149'),
approxDocumentsToCopy: Long('4'),
approxBytesToCopy: Long('178'),
bytesCopied: Long('234'),
countWritesToStashCollections: Long('0'),
documentsCopied: Long('6')
},
{
shard: 'shard02',
totalCopyTimeElapsedSecs: Long('0'),
totalApplyTimeElapsedSecs: Long('0'),
totalCriticalSectionTimeElapsedSecs: Long('0'),
type: 'op',
desc: 'ReshardingMetricsDonorService 125605c4-1c6e-43e3-beeb-e98e9fd81466',
op: 'command',
ns: 'user_center.newusers',
originatingCommand: {
reshardCollection: 'user_center.newusers',
key: { name: 1 },
unique: 'false',
collation: { locale: 'simple' }
},
totalOperationTimeElapsedSecs: Long('287'),
donorState: 'donating-initial-data',
countWritesDuringCriticalSection: Long('0'),
countReadsDuringCriticalSection: Long('0')
}
]
注意上面返回内容中的:
totalOperationTimeElapsedSecs
remainingOperationTimeEstimatedSecs
此外, 在resharding期间,我们可以在mongos上尝试插入一条数据,可以发现插入数据是完全没有问题的。
resharding的这个步骤耗时比较长,在本机通过mlaunch环境搭建的测试的时候,发现仅10条document的resharding的耗时就挺久了,下面是操作期间mongos的日志,可以看到用了接近5分钟(可能我这实验的机器也比较垃圾)。
{
"t": {
"$date": "2024-10-03T20:04:03.404+08:00"
},
"s": "I",
"c": "RESHARD",
"id": 7763800,
"ctx": "ReshardingCoordinatorService-6",
"msg": "Resharding complete",
"attr": {
"info": {
"uuid": {
"uuid": {
"$uuid": "125605c4-1c6e-43e3-beeb-e98e9fd81466"
}
},
"status": "success",
"statistics": {
"ns": "user_center.newusers",
"sourceUUID": {
"uuid": {
"$uuid": "619f86f5-451b-4135-bf2c-fd0f43b1cefc"
}
},
"newUUID": {
"uuid": {
"$uuid": "125605c4-1c6e-43e3-beeb-e98e9fd81466"
}
},
"newShardKey": {
"name": 1
},
"startTime": {
"$date": "2024-10-03T11:59:01.321Z"
},
"endTime": {
"$date": "2024-10-03T12:04:03.403Z"
},
"totalCopyTimeElapsedSecs": 300,
"totalApplyTimeElapsedSecs": 0,
"totalCriticalSectionTimeElapsedSecs": 1
}
}
}
}
[direct: mongos] user_center> sh.status()
{
database: {
_id: 'user_center',
primary: 'shard01',
partitioned: false,
version: {
uuid: UUID('e440a9b2-97f1-462f-b35a-af48a1fae2d8'),
timestamp: Timestamp({ t: 1727754530, i: 1 }),
lastMod: 1
}
},
collections: {
'user_center.newusers': {
shardKey: { name: 1 }, 注意看这里
unique: false,
balancing: true,
chunkMetadata: [
{ shard: 'shard01', nChunks: 1 },
{ shard: 'shard02', nChunks: 1 }
],
chunks: [
{ min: { name: MinKey() }, max: { name: 'zhangsan' }, 'on shard': 'shard02', 'last modified': Timestamp({ t: 1, i: 0 }) },
{ min: { name: 'zhangsan' }, max: { name: MaxKey() }, 'on shard': 'shard01', 'last modified': Timestamp({ t: 1, i: 5 }) }
],
tags: []
}
}
}
]
[direct: mongos] user_center> db.newusers.getShardDistribution()
Shard shard02 at shard02/localhost:27019
{
data: '276B',
docs: 7, -- shard2有7条数据
chunks: 1,
'estimated data per chunk': '276B',
'estimated docs per chunk': 7
}
---
Shard shard01 at shard01/localhost:27018
{
data: '123B',
docs: 3, -- shard1有3条数据
chunks: 1,
'estimated data per chunk': '123B',
'estimated docs per chunk': 3
}
---
Totals
{
data: '399B',
docs: 10,
chunks: 2,
'Shard shard02': [
'69.17 % data',
'70 % docs in cluster',
'39B avg obj size on shard'
],
'Shard shard01': [
'30.82 % data',
'30 % docs in cluster',
'41B avg obj size on shard'
]
}
迁移完成后,再次查看shard1和shard2上的数据情况:
根据这篇percona的blog,resharding对cpu和io、时延的影响挺大,因此请确保在低峰期进行该操作(大型collections生产上即便是低峰期执行这个resharding操作,个人感觉还是很危险的)
附图,percona blog的pmm监控截图:
官方文档
https://www.mongodb.com/zh-cn/docs/manual/core/sharding-reshard-a-collection/
https://www.mongodb.com/zh-cn/docs/manual/reference/command/refineCollectionShardKey/
http://www.mongodb.com/zh-cn/docs/manual/core/sharding-shard-a-collection/
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。