Skip to content

The backup is not point in time if a tailed oplog's last timestamp is before a mongodump's. #164

@rob256

Description

@rob256

This is only for a sharded cluster, and could occur frequently for small sharded clusters.

So the current process is:
start tailed oplogs
start mongodumps of each replicaset with --oplog
finish mongodumps
stop tailed oplogs

Then the evaluation is made using Resolver.get_consistent_end_ts() which looks at the tailed oplogs, and then appends the oplog entries from the tailed oplogs to the mongodump oplogs.

However, in its current state, it could be common that the backups taken are not point in time.

When I was testing, I found that the optime on the configservers update a lot less frequently. In fact, it looks like it can be every 10 seconds:

configReplSet:PRIMARY> db.oplog.rs.find({}, {'ts': 1, 'op': 1, 'o': 1}).sort({$natural:-1})
{ "ts" : Timestamp(1496305850, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:50.770Z") } } }
{ "ts" : Timestamp(1496305848, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.833Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305848, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:48.830Z"), "up" : NumberLong(8444197), "waiting" : false } } }
{ "ts" : Timestamp(1496305839, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:39.157Z") } } }
{ "ts" : Timestamp(1496305838, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.823Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305838, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:38.817Z"), "up" : NumberLong(8444187), "waiting" : false } } }
{ "ts" : Timestamp(1496305828, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.961Z") } } }
{ "ts" : Timestamp(1496305828, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.809Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305828, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:28.804Z"), "up" : NumberLong(8444177), "waiting" : false } } }
{ "ts" : Timestamp(1496305820, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:20.765Z") } } }
{ "ts" : Timestamp(1496305818, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.800Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305818, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:18.796Z"), "up" : NumberLong(8444167), "waiting" : false } } }
{ "ts" : Timestamp(1496305809, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:09.151Z") } } }
{ "ts" : Timestamp(1496305808, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.789Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305808, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:30:08.782Z"), "up" : NumberLong(8444157), "waiting" : false } } }
{ "ts" : Timestamp(1496305798, 3), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.955Z") } } }
{ "ts" : Timestamp(1496305798, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.776Z"), "waiting" : true } } }
{ "ts" : Timestamp(1496305798, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:58.771Z"), "up" : NumberLong(8444147), "waiting" : false } } }
{ "ts" : Timestamp(1496305790, 1), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:50.764Z") } } }
{ "ts" : Timestamp(1496305788, 2), "op" : "u", "o" : { "$set" : { "ping" : ISODate("2017-06-01T08:29:48.770Z"), "waiting" : true } } }

This becomes a problem because...
The last mongodump that finishes, it's oplog time would be near the current time. But when they finish, the oplog tailers are killed off almost immediately, which means, a config server oplog tailer may have a last_ts of up to 10 seconds ago.
When the evaluation for get_consistent_end_ts() is made the ResolverThread then has a max_end_ts of the config server's last_ts. This means, tailed oplog entries are appended to the mongodump oplog only if they're prior to this max_end_ts, however, one of the mongodump's oplog already exceeds the max_end_ts. This leaves one mongodump with a higher oplog time than the others.

I was testing this by adding around 1000 documents per second against the mongos for one of the sharded collections, and then comparing the documents.

I can work around this by adding a sleep after the mongo dumps finish, given the tailed oplog for the configserver a chance to write it's next timestamp, of which a point in time backup is then made.

The graceful fix for this would be:
Once the mongodumps finish, take the last oplog timestamp from the mongodumps, and then ensure the oplog tailers exceed this. This way, the get_consistent_end_ts() would then be correct.

Please feel free to ask if you need any further information.

Thanks,

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions