Recover LVM Volume Groups and Logical Volumes WITHOUT Backups
I recently had a misfortune, in that somehow my volume group meta-data got corrupted, and LVM would not enable the volume group. This happened after I resized a volume, and had done a file system check before and after. So, I knew my data was still there.
I did an lvextend on my primary logical volume. Normally this is a routine task, but for some reason, things went very badly for me this time around. I did an "fsck -f" before and after extending the volume and the filesystem (with resize2fs). Everything checked out just fine, so I thought everything was done, and ready to reboot.
I proceeded to issue the three finger salute, and my system began the reboot process. Upon trying to boot up, I got errors about my root logical volume not being found. So, I booted up with a gentoo live cd again, and got the following errors...
root@Microknoppix:~# pvscan /dev/hda: open failed: Read-only file system Attempt to close device '/dev/hda' which is not open. Incorrect metadata area header checksum Incorrect metadata area header checksum Incorrect metadata area header checksum WARNING: Volume Group s is not consistent PV /dev/sdb5 VG bak lvm2 [32.00 GB / 0 free] PV /dev/sdb6 VG bak lvm2 [266.09 GB / 0 free] PV /dev/sda4 VG s lvm2 [207.58 GB / 88.70 GB free] PV /dev/sdf2 lvm2 [59.88 GB] PV /dev/sdf4 lvm2 [88.89 GB] Total: 5 [654.42 GB] / in use: 3 [505.66 GB] / in no VG: 2 [148.76 GB] root@Microknoppix:~# vgchange -ay Incorrect metadata area header checksum Incorrect metadata area header checksum 1 logical volume(s) in volume group "bak" now active Incorrect metadata area header checksum Incorrect metadata area header checksum Volume group "s" inconsistent Incorrect metadata area header checksum Incorrect metadata area header checksum WARNING: Inconsistent metadata found for VG s - updating to use version 19 Incorrect metadata area header checksum Automatic metadata correction failed
So, what to do? I didn't have a system level backup (just data), because I hadn't gotten a round to it yet, after converting from Mac OS X. Well, with gentoo Linux, this is a bit devastating, because you have to recompile entirely from source. That can take days, and sometimes it takes months to figure out all the settings you had, if you don't have a backup. To boot, I didn't even have a backup of '/etc/', where the LVM backups are stored, DOH!!! As a result, all the lvm backups were unavailable. This is all really sad, seeing I'm an automatic backup freak, and can't stand it when I don't have system backups.
Now, one could go searching for the beginning of their logical volume, if they wanted to, and recover just that. But, that could be very painful, especially if your LV is not contiguous; which can easily happen if you have multiple drives in your volume group, and you have been resizing your volumes a few times.
Well, as it goes, my brother suggested I boot up with a knoppix CD, to see if I could figure out how to fix the problem. I began running a few different commands. I started with pvs, which displays physical volume information.
root@Microknoppix:~# pvs -v Scanning for physical volume names Incorrect metadata area header checksum Incorrect metadata area header checksum WARNING: Volume Group s is not consistent Incorrect metadata area header checksum Incorrect metadata area header checksum PV VG Fmt Attr PSize PFree DevSize PV UUID /dev/sda4 s lvm2 a- 207.58G 88.70G 207.58G 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J /dev/sdb5 bak lvm2 a- 32.00G 0 32.00G Vx3gVW-YNoq-xLHt-rOaJ-2HHW-qBcj-nJSfmx /dev/sdb6 bak lvm2 a- 266.09G 0 266.09G o7Mi6k-lEsH-ndqb-QMxe-3t3Z-jTWS-qw9KKv
Next I did "vgdisplay -v" dump, and then I ran into pvck by accident. This will display your physical volume information. It happens to display the offsets to all of your LVM metadata backups (GREAT). I've heard that these are stored in a cycling manner. So, it may not be worth paying attention to the order they appear.
root@Microknoppix:/mnt/safe/trenta# pvck -d -v /dev/sda4 Scanning /dev/sda4 Incorrect metadata area header checksum Found label on /dev/sda4, sector 1, type=LVM2 001" Found text metadata area: offset=4096, size=192512 Found LVM2 metadata record at offset=26624, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=24576, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=22528, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=20480, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=17920, size=2560, offset2=0 size2=0 Found LVM2 metadata record at offset=15360, size=2560, offset2=0 size2=0 Found LVM2 metadata record at offset=12800, size=2560, offset2=0 size2=0 Found LVM2 metadata record at offset=10752, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=8704, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=6656, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=4608, size=2048, offset2=0 size2=0 Found LVM2 metadata record at offset=30720, size=165888, offset2=0 size2=0 Found text metadata area: offset=96646856704, size=183296 Incorrect metadata area header checksum
At this point, I'm becoming a little more cheery; I have a chance at recovery, FINALLY; no recompiling my entire system. So, I check knoppix to find out if it has a hex editor, and sure enough, there is an editor called hexedit. So, I ran "hexedit /dev/sda4". I converted the offsets from the pvck output to hex, and went to those offsets by paging down in hexedit. The last record is 30720, which equates to 0x7800. I start highlighting at that offset (Ctrl-Space), and then select the entire file up until the 0x0A0A newline bytes at the end of it. I copy (Esc-W), and then I paste to a file (Esc-Y), and call it /path/lvm-metadata-30720.txt. I do this for everyone that I think might contain information I need. In order to know whether it's relevant data or not, you have to know what the last few LVM changes you made were. Things like sizes of logical volumes, the size of the volume group, what physical volumes existed in the volume group, and things of that nature will all be helpful in recovery. For example, did you double the size of a logical volume, or reduce, or whatever? Then, it's just a matter of diffing each version, to see when the important changes were made. My diff shows...
root@Microknoppix:/mnt/safe/trenta# diff -u lvm-metadata-26624.txt lvm-metadata-28672.txt
--- lvm-metadata-26624.txt 2009-05-25 23:45:11.000000000 +0000
+++ lvm-metadata-28672.txt 2009-05-25 23:41:22.000000000 +0000
@@ -1,6 +1,6 @@
s {
id = "HNHKOr-RpyA-uMdz-tqhN-Z673-L2ej-qXikhF"
-seqno = 18
+seqno = 19
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
@@ -64,7 +64,7 @@
segment1 {
start_extent = 0
-extent_count = 7680
+extent_count = 15360
type = "striped"
stripe_count = 1 # linear
@@ -94,7 +94,7 @@
}
}
}
-# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:42:45 2009
+# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:44:25 2009
contents = "Text Format Volume Group"
version = 1
@@ -102,5 +102,5 @@
description = ""
creation_host = "tdamac" # Linux tdamac 2.6.30-rc3-dirty #23 SMP Mon May 25 01:39:04 MDT 2009 x86_64
-creation_time = 1243240965 # Mon May 25 02:42:45 2009
+creation_time = 1243241065 # Mon May 25 02:44:25 2009
If you look carefully in my diff, you will see the seqno changed from 18 to 19, and the size of the "segment" doubled; that is what I had just done as part of my logical volume resize. The "vgchange -ay" command previously tried to restore seqno=19, which happened to fail. As it were, it appears that v19 of the meta-data backups is the version I need. So, now it's just a matter of taking all of the data we have dumped, and recreating the LVM information. It may be worth doing a RAW dd dump of your partition before messing with it, but I didn't bother. I was fairly certain that my data was still there, and I knew that restoring a configuration would not wipe out the data, just the LVM meta-data.
So, finally I completely recreated everything by doing the following. The UUID for the physical volume comes from the meta data file. It's important that you choose the correct ID, or it won't match the LVM data for the vgcfgrestore.
root@Microknoppix:~# pvcreate -ff -u 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J \ --restorefile lvm-metadata-28672.txt /dev/sda4 root@Microknoppix:~# vgcfgrestore -f /mnt/safe/trenta/lvm-metadata-28672.txt -v s
I am pretty certain that only the second command was needed, because the physical volume meta-data was fine, it was the volume group that was messed up. But, I was hacking my way through, so I issued both commands.
WARNING, WARNING, WARNING, if you do this wrong, and use the wrong meta-data information, you may overwrite some file data somewhere else on the disk. This can happen when the extents and sizes are off, so the vgcfgrestore command restores the meta-data to the wrong parts of the disk. Be sure you are able to pick the correct backup data to restore, or you very well may loose data.
Now, after all of this, I think it may have been as easying as doing the following, but I am unsure. I think what would happen with the following command, is that it would ignore the locking failure, and restore the configuration regardless.
vgchange -ay --ignorelockingfailure
Either way, this information is useful in the event that you loose meta-data due to bad blocks or whatever.
June 6th, 2009 - 21:10
This happened to me just now…thanks for the blog post – it was very helpful.
I found I *did* have to re-create the pv with pvcreate -ff – in fact the pv’s vg metadata was the problem, it seems.
your “vgchange” command suggestion at the very end was tried but didn’t help.
Thanks again for a very timely post.
June 6th, 2009 - 21:30
> your “vgchange” command suggestion at the very end
> was tried but didn’t help.
Oh, thanks for that, I wasn’t sure if that vgchange thing would work or not. I just happened to notice that in the man page after the fact.
And you’re very welcome. After 2-4 hours of being very upset, I figured I should try and spare others the “pain”, LOL.
Oh, and I noticed my pvcreate and vgcfgrestore commands were referring to different files, hehe. That would be bad if someone tried that. I will correct it.
August 4th, 2009 - 14:25
in my case offset of the last record does not really point to metadata and the offset of the previous record points to nonexistant place on the disk. When I try with sdb, size of the last record does not match. I just lost more than 600GB because something in hackintosh beta messed up my arrays. I feel, like I will kill somebody for that. It’s so frustrating, that words can’t describe this experience.
August 4th, 2009 - 14:31
(if you ever try hackintosh, PHYSICALLY UNPLUG ALL YOUR HARD DRIVES WITH DATA, or it can silently cross you and ruin them right after it boots up to the instalator)
August 4th, 2009 - 22:13
600G, ouch, that is very painful
Don’t give up though, open the disk with a hexeditor, and see if you can find the configurations near the beginning of the disk.
October 11th, 2009 - 06:10
You, sir, are my hero. Thanks to your great blog I’ve managed to get back 3.6TB of precious data. Thank you. It worked like a charm.
October 18th, 2009 - 11:48
Thanks, you saved me too
March 23rd, 2010 - 01:31
Hi I need some help to restore data from hardware based clones/snapshots. I am using the following command in rhel to change the uuid of the clone: pvchange –uuid /dev/mapper/mpath2 –config ‘global{activation=0}’ . I want to know if there something similar in SLES 9 onwards for deactivating device-mapper interaction and change the LVM metadata? Thanks, Salah.
March 23rd, 2010 - 01:39
Salah,
I have no idea actually. But, you could get out a disk hex editor, and do it that way.
May 19th, 2010 - 11:58
Great article, Adams… helped me a lot in recovering a screwed-up PV here, aparently by the same cause (resizing the PV with pvresize and/or resizing the VG with “vgchange -s”). I didn’t have hexedit, so I dumped the whole PV metadata using dd bs=1 skip=N size=M lvm.cfg(where N and M are respectively the offset and the size shown by pvck) and then editing the resulting lvm.cfg to select the
valid metadata to restore. Also, I had to use BOTH commands in the end (pvcreate AND vgcfgrestore),
as vgcfgrestore alone complained about metadata checksum error and refused to run. Hope this additional info is useful for someone else.