Recover LVM Volume Groups and Logical Volumes WITHOUT Backups

I recently had a misfortune, in that somehow my volume group meta-data got corrupted, and LVM would not enable the volume group. Essentially, I lost my LVM volume disk. This happened after I resized a volume, and had done a file system check before and after. So, I knew my data was still there.

I did an lvextend on my primary logical volume. Normally this is a routine task, but for some reason, things went very badly for me this time around. I did an “fsck -f” before and after extending the volume and the filesystem (with resize2fs). Everything checked out just fine, so I thought everything was done, and ready to reboot.

I proceeded to issue the three finger salute, and my system began the reboot process. Upon trying to boot up, I got errors about my root logical volume not being found. So, I booted up with a gentoo live cd again, and got the following errors…

root@Microknoppix:~# pvscan
/dev/hda: open failed: Read-only file system
Attempt to close device '/dev/hda' which is not open.
Incorrect metadata area header checksum
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Volume Group s is not consistent
PV /dev/sdb5   VG bak             lvm2 [32.00 GB / 0    free]
PV /dev/sdb6   VG bak             lvm2 [266.09 GB / 0    free]
PV /dev/sda4   VG s               lvm2 [207.58 GB / 88.70 GB free]
PV /dev/sdf2                      lvm2 [59.88 GB]
PV /dev/sdf4                      lvm2 [88.89 GB]
Total: 5 [654.42 GB] / in use: 3 [505.66 GB] / in no VG: 2 [148.76 GB]

root@Microknoppix:~# vgchange -ay
Incorrect metadata area header checksum
Incorrect metadata area header checksum
1 logical volume(s) in volume group "bak" now active
Incorrect metadata area header checksum
Incorrect metadata area header checksum
Volume group "s" inconsistent
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Inconsistent metadata found for VG s - updating to use version
19
Incorrect metadata area header checksum
Automatic metadata correction failed

So, what to do? I didn’t have a system level backup (just data), because I hadn’t gotten a round to it yet, after converting from Mac OS X. Well, with gentoo Linux, this is a bit devastating, because you have to recompile entirely from source. That can take days, and sometimes it takes months to figure out all the settings you had, if you don’t have a backup. To boot, I didn’t even have a backup of ‘/etc/’, where the LVM backups are stored, DOH!!! As a result, all the lvm backups were unavailable. This is all really sad, seeing I’m an automatic backup freak, and can’t stand it when I don’t have system backups.

Now, one could go searching for the beginning of their logical volume, if they wanted to, and recover just that. But, that could be very painful, especially if your LV is not contiguous; which can easily happen if you have multiple drives in your volume group, and you have been resizing your volumes a few times.

Well, as it goes, my brother suggested I boot up with a knoppix CD, to see if I could figure out how to fix the problem. I began running a few different commands. I started with pvs, which displays physical volume information.

root@Microknoppix:~# pvs -v

Scanning for physical volume names
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Volume Group s is not consistent
Incorrect metadata area header checksum
Incorrect metadata area header checksum
PV         VG   Fmt  Attr PSize   PFree  DevSize PV UUID
/dev/sda4  s    lvm2 a-   207.58G 88.70G 207.58G 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J
/dev/sdb5  bak  lvm2 a-    32.00G     0   32.00G Vx3gVW-YNoq-xLHt-rOaJ-2HHW-qBcj-nJSfmx
/dev/sdb6  bak  lvm2 a-   266.09G     0  266.09G o7Mi6k-lEsH-ndqb-QMxe-3t3Z-jTWS-qw9KKv

Next I did “vgdisplay -v” dump, and then I ran into pvck by accident. This will display your physical volume information. It happens to display the offsets to all of your LVM metadata backups (GREAT).  I’ve heard that these are stored in a cycling manner.  So, it may not be worth paying attention to the order they appear.

root@Microknoppix:/mnt/safe/trenta# pvck -d -v /dev/sda4
Scanning /dev/sda4
Incorrect metadata area header checksum
Found label on /dev/sda4, sector 1, type=LVM2 001"
Found text metadata area: offset=4096, size=192512
Found LVM2 metadata record at offset=26624, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=24576, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=22528, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=20480, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=17920, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=15360, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=12800, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=10752, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=8704, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=6656, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=4608, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=30720, size=165888, offset2=0 size2=0
Found text metadata area: offset=96646856704, size=183296
Incorrect metadata area header checksum

At this point, I’m becoming a little more cheery; I have a chance at recovery, FINALLY; no recompiling my entire system. So, I check knoppix to find out if it has a hex editor, and sure enough, there is an editor called hexedit. So, I ran “hexedit /dev/sda4”. I converted the offsets from the pvck output to hex, and went to those offsets by paging down in hexedit. The last record is 30720, which equates to 0x7800. I start highlighting at that offset (Ctrl-Space), and then select the entire file up until the 0x0A0A newline bytes at the end of it. I copy (Esc-W), and then I paste to a file (Esc-Y), and call it /path/lvm-metadata-30720.txt. I do this for everyone that I think might contain information I need. In order to know whether it’s relevant data or not, you have to know what the last few LVM changes you made were. Things like sizes of logical volumes, the size of the volume group, what physical volumes existed in the volume group, and things of that nature will all be helpful in recovery. For example, did you double the size of a logical volume, or reduce, or whatever? Then, it’s just a matter of diffing each version, to see when the important changes were made. My diff shows…

root@Microknoppix:/mnt/safe/trenta# diff -u lvm-metadata-26624.txt lvm-metadata-28672.txt
--- lvm-metadata-26624.txt      2009-05-25 23:45:11.000000000 +0000
+++ lvm-metadata-28672.txt      2009-05-25 23:41:22.000000000 +0000
@@ -1,6 +1,6 @@
s {
id = "HNHKOr-RpyA-uMdz-tqhN-Z673-L2ej-qXikhF"
-seqno = 18
+seqno = 19
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
@@ -64,7 +64,7 @@

segment1 {
start_extent = 0
-extent_count = 7680
+extent_count = 15360

type = "striped"
stripe_count = 1       # linear
@@ -94,7 +94,7 @@
}
}
}
-# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:42:45 2009
+# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:44:25 2009

contents = "Text Format Volume Group"
version = 1
@@ -102,5 +102,5 @@
description = ""

creation_host = "tdamac"       # Linux tdamac 2.6.30-rc3-dirty #23 SMP Mon May 25 01:39:04 MDT 2009 x86_64
-creation_time = 1243240965     # Mon May 25 02:42:45 2009
+creation_time = 1243241065     # Mon May 25 02:44:25 2009

If you look carefully in my diff, you will see the seqno changed from 18 to 19, and the size of the “segment” doubled; that is what I had just done as part of my logical volume resize. The “vgchange -ay” command previously tried to restore seqno=19, which happened to fail. As it were, it appears that v19 of the meta-data backups is the version I need. So, now it’s just a matter of taking all of the data we have dumped, and recreating the LVM information. It may be worth doing a RAW dd dump of your partition before messing with it, but I didn’t bother. I was fairly certain that my data was still there, and I knew that restoring a configuration would not wipe out the data, just the LVM meta-data.

So, finally I completely recreated everything by doing the following.  The UUID for the physical volume comes from the meta data file.  It’s important that you choose the correct ID, or it won’t match the LVM data for the vgcfgrestore.

root@Microknoppix:~# pvcreate -ff -u 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J \
  --restorefile lvm-metadata-28672.txt /dev/sda4
root@Microknoppix:~# vgcfgrestore -f /mnt/safe/trenta/lvm-metadata-28672.txt -v s

I am pretty certain that only the second command was needed, because the physical volume meta-data was fine, it was the volume group that was messed up. But, I was hacking my way through, so I issued both commands.

WARNING, WARNING, WARNING, if you do this wrong, and use the wrong meta-data information, you may overwrite some file data somewhere else on the disk.  This can happen when the extents and sizes are off, so the vgcfgrestore command restores the meta-data to the wrong parts of the disk.  Be sure you are able to pick the correct backup data to restore, or you very well may loose data.

Now, after all of this, I think it may have been as easying as doing the following, but I am unsure.  I think what would happen with the following command, is that it would ignore the locking failure, and restore the configuration regardless.

vgchange -ay --ignorelockingfailure

Either way, this information is useful in the event that you loose meta-data due to bad blocks or whatever.

 

  • how to repair a logical volume
  • how to repair bad configured lvm
  • incorrect metadata area header checksum on