Replay the oplog in MongoDB

Only a braindead can't find x

Only a braindead can’t find x

We had what I thought was a completely grueling homework question in M202. One person finished it very quickly and labeled the rest of us “a braindead” for not being able to do so (great encouragement right…and yes that was the grammar he used). The majority of the students seemed to have a problem with it, as I did. To be fair to the assignment, they did throw us a link to the following StackOverflow link. This is probably where those of us who are braindead failed the test.

I found that I was able to clearly understand the post and get the assignment completed after I extracted all the steps one by one. However, I kind of felt same way some others did…”why didn’t they teach us how to do this?” Seems like backup and restore are an important skill set for any DBA to have, so why wouldn’t they have made sure we knew exactly what to do, then have us repeat the process across a few different scenarios?

Anyways, because I wasn’t satisfied with the clarity of the documentation, and because I wanted to make sure I understood as much as possible, I decided it would be a good topic for a post. Here are the steps that I used in order to solve the problem and my attempt to explain them. Since the class didn’t go into detail about these kind of things (maybe the paid training does idk), I am not 100% sure, so any feedback or corrections are welcomed. It would be nice to have this kind of operation completely explained thoroughly and accurately (Please note your output will vary, I included the output for reference in case you are completely new or lost).

Lets_Go

So start up the assignment with the provided conf file. This brings up the production server on port 30001

mongod -f mongod.df30592c9ae3.conf

You will need to setup a second instance of mongod This is the instance where you will replay the operations, verify, and then use as the source for the file copy. Think about why you need to do this. If you were to just use a single instance, restore a db over it, then try to replay the dumped oplog from the same server…you will get an error stating “The oplogLimit is not newer than the last oplog entry on the server.” Makes sense right? Maybe, kinda, or not at all. The oplog in a replica set is stored at local.oplog.rs which you didn’t overwrite at all, so in this case some of us were trying replay a oplog that was no different than the oplog already running. Start the recovery instance of mongod. Observe that we are using a different port and we are not making it a member of the production servers replica set.

mongod --dbpath anemptydir/ --port 30002 --smallfiles

Restore the backupColl collection to the backupDB database on the recovery instance.

mongorestore --port 30002 --db backupDB backupDB/

connected to: 127.0.0.1:30002
2014-08-03T16:52:37.109-0700 backupDB/backupColl.bson
2014-08-03T16:52:37.109-0700 going into namespace [backupDB.backupColl]
2014-08-03T16:52:40.003-0700 Progress: 27327300/55036800 49% (bytes)
2014-08-03T16:52:43.003-0700 Progress: 51242100/55036800 93% (bytes)
604800 objects found
2014-08-03T16:52:43.416-0700 Creating index: { key: { _id: 1 }, name: "_id_", ns: "backupDB.backupColl" }

Create directories to put the oplog from the first instance. If you don’t go through the steps to create the empty directories and restore from them, you will get some error complaining about not being able to restore from a non empty directory (I can’t remember this one exactly, remember I was in a state of murderous rage during the homework, so I’m happy I was able to document this much).

mkdir oplogD

This mongodump dumps the oplog from the production server of mongod. You need to do this obviously in order to retrieve any operations that happened after your last snapshot

mongodump --port 30001 -d local -c oplog.rs -o oplogD

connected to: 127.0.0.1:30001
2014-08-03T16:34:13.890-0700 DATABASE: local to oplogD/local
2014-08-03T16:34:13.890-0700 local.oplog.rs to oplogD/local/oplog.rs.bson
2014-08-03T16:34:14.158-0700 701202 documents
2014-08-03T16:34:14.158-0700 Metadata for local.oplog.rs to oplogD/local/oplog.rs.metadata.json

Next stick the log in its own directory

mkdir oplogR
mv oplogD/local/oplog.rs.bson oplogR/oplog.bson

Using bsondump make the oplog into a human readable format. I’m not so sure this is the best way to find the transaction. If the oplog is very large, this might take a long time to complete.

bsondump oplogR/oplog.bson > oplog.readable

2014-08-03T16:35:42.012-0700 Progress: 35754619/113990182 31% (bytes)
2014-08-03T16:35:45.002-0700 Progress: 73596819/113990182 64% (bytes)
2014-08-03T16:35:48.003-0700 Progress: 107355055/113990182 94% (bytes)
701202 objects found

Next utilize cat and grep to find the offending operation. I added the -A and -B option to grep so that I could also capture the 10 lines before and after the offending transaction. I think in a real world scenario it might be good to investigate transactions around the problem time as well.

cat oplog.readable | grep -A 10 -B 10 "drop"

Some others reported querying the oplog on the production server to find the offending transaction using

db.oplog.rs.find( { "op": {$nin:["u", "i"]} } )

If the above code seems a little confusing at first, then read up on the oplog internals. The query is excluding inserts and updates. In this case we knew we were looking for a drop operation. In a real world scenario where we don’t know what happened, we would need to get a little more creative.

Once you have found the OFFENDING transaction, you will replay the log to that value. The doucumentation says Prevents mongorestore from applying oplog entries with timestamp newer than or equal to timestamp. This could be confusing, so you can also think of it as less than the oplog timestamp.

Example if the offending transaction is at 1398778745:1 then you will be replaying up to 1398778745:1, not including 1398778745:1.

Next replay the oplog on the recovery instance

mongorestore --host 127.0.0.1 --port 30002 --oplogReplay --oplogLimit 1398778745:1 ../30001/oplogR/

connected to: 127.0.0.1:30002
2014-08-03T16:54:38.018-0700 Replaying oplog
2014-08-03T16:54:41.005-0700 Progress: 8533619/113990182 7% (bytes)
2014-08-03T16:54:44.007-0700 Progress: 17050619/113990182 14% (bytes)
2014-08-03T16:54:47.003-0700 Progress: 25484119/113990182 22% (bytes)
2014-08-03T16:54:50.006-0700 Progress: 33951019/113990182 29% (bytes)
2014-08-03T16:54:53.006-0700 Progress: 42117319/113990182 36% (bytes)
2014-08-03T16:54:56.003-0700 Progress: 50433919/113990182 44% (bytes)
2014-08-03T16:54:59.004-0700 Progress: 58950919/113990182 51% (bytes)
2014-08-03T16:55:02.007-0700 Progress: 67434519/113990182 59% (bytes)
2014-08-03T16:55:05.004-0700 Progress: 75884719/113990182 66% (bytes)
2014-08-03T16:55:08.006-0700 Progress: 84435119/113990182 74% (bytes)
2014-08-03T16:55:11.005-0700 Progress: 93018919/113990182 81% (bytes)
2014-08-03T16:55:14.007-0700 Progress: 101263555/113990182 88% (bytes)
2014-08-03T16:55:17.003-0700 Progress: 106713155/113990182 93% (bytes)
2014-08-03T16:55:20.007-0700 Progress: 112097255/113990182 98% (bytes)
701202 objects found
2014-08-03T16:55:20.835-0700 Applied 701200 oplog entries out of 701201 (1 skipped).

Verify the count of the records (for the homework you should have 614800)

Now, shutdown both the production and recovery instances of mongod, and copy over the database files from the recovery instance to the production instance.

cp 30002/backuptest/backupDB.* 30001/backuptest(where backupDB.* is the name of the database we are restoring/replaying

There is another way of getting this done using mongooplog which will replay the transactions from your recovery server to target serer, see this link for the documentation http://docs.mongodb.org/manual/reference/program/mongooplog/. I will update this post once I have test it.

Start up mongod once again in production and verify

You should be good to go

Homework completed...fuck yeah!

Homework completed…fuck yeah!

The answer was given after the deadline by the instructor which was a very interesting and cooler way to handle this.

1. Shutdown the production server mongod
2. ./mongorestore –db backupDB –collection backupColl ./backupDB/backupColl.bson –dbpath ./backuptest
The reasoning behind using dbpath while the server is shutdown is to avoid thousands more operations in the oplog
3. Startup the production server
4. > use local
5. Find the offending operation, in our case a drop command: db.oplog.rs.find({“op” : “c” , “ns” : “backupDB.$cmd”, “o” : {“drop”: “backupColl”}})
A more general form of this would be
db.oplog.rs.find({"op" : "c", "ns" : "backupDB.$cmd"})
6. Modify the record in the oplog to a non op.
db.oplog.rs.update({“op”:”c”, “ns”: “backupDB.$cmd”, “o”: {“drop”: “backupColl”}}, {$set : {“op” : “n”}, $unset : {“o”: 1}})
The reason for using $unset is so that the record does not get resized
7. Make a directory for the oplog mkdir oplog-replay
8. mongodump –port 30001 -d local -c oplog.rs -o oplog-dump
Then move the oplog.rs.bson to the oplog-replay folder
9. mongorestore –port 30001 -oplogReplay ./oplog-replay which replays everything in the oplog
10. Win

3 comments

  1. Thanks to your help I was able to complete this problem. This is my second time thru M202, I dropped the previous class due to frustration with homework problems like this one. I followed your instructions explicitly and had to make these minor changes:

    cp 30002/backuptest/backupDB.* 30001/backuptest
    to
    cp anemptydir/backupDB.* backuptest

    and

    mongorestore –host 127.0.0.1 –port 30002 –oplogReplay –oplogLimit 1398778745:1 ../30001/oplogR/
    to
    mongorestore –host 127.0.0.1 –port 30002 –oplogReplay –oplogLimit 1398778745:1 oplogR

    Here are the contents of my working dir. No ../30001 or ../30002 dirs either.

    453 Jan 26 16:50 mongod.df30592c9ae3.conf
    55050240 Jan 26 16:50 backupDB.tar.e06c05d8f851
    570439680 Jan 26 16:50 backuptest.tar.2078fd1f5608
    170 Jan 27 07:52 backupDB
    102 Jan 27 07:57 oplogD
    374 Jan 27 08:29 anemptydir
    102 Jan 27 08:33 oplogR
    170676451 Jan 27 08:33 oplog.readable
    510 Jan 27 08:42 backuptest

    Everything else was spot on!!

    PasDep

  2. Very interesting and useful article. My question is how do you arrive at the last timestamp from the oplog.bson that was generated by mongodump using the –oplog option. The glitch is I want to do an incremental restore from the last timestamp under the generated oplog.bson. Once I get that then a mongorestore –oplogReplay will then be possible.

  3. Big help! Thank you so much for the post. It was much more clear than the instructions posted in the class.

    Cheers!
    Chris

Leave a Reply

Your email address will not be published. Required fields are marked *