Adams Bros Blog

30May/0924

Recover LVM Volume Groups and Logical Volumes WITHOUT Backups

I recently had a misfortune, in that somehow my volume group meta-data got corrupted, and LVM would not enable the volume group. Essentially, I lost my LVM volume disk. This happened after I resized a volume, and had done a file system check before and after. So, I knew my data was still there.

I did an lvextend on my primary logical volume. Normally this is a routine task, but for some reason, things went very badly for me this time around. I did an "fsck -f" before and after extending the volume and the filesystem (with resize2fs). Everything checked out just fine, so I thought everything was done, and ready to reboot.

I proceeded to issue the three finger salute, and my system began the reboot process. Upon trying to boot up, I got errors about my root logical volume not being found. So, I booted up with a gentoo live cd again, and got the following errors...

root@Microknoppix:~# pvscan
/dev/hda: open failed: Read-only file system
Attempt to close device '/dev/hda' which is not open.
Incorrect metadata area header checksum
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Volume Group s is not consistent
PV /dev/sdb5   VG bak             lvm2 [32.00 GB / 0    free]
PV /dev/sdb6   VG bak             lvm2 [266.09 GB / 0    free]
PV /dev/sda4   VG s               lvm2 [207.58 GB / 88.70 GB free]
PV /dev/sdf2                      lvm2 [59.88 GB]
PV /dev/sdf4                      lvm2 [88.89 GB]
Total: 5 [654.42 GB] / in use: 3 [505.66 GB] / in no VG: 2 [148.76 GB]

root@Microknoppix:~# vgchange -ay
Incorrect metadata area header checksum
Incorrect metadata area header checksum
1 logical volume(s) in volume group "bak" now active
Incorrect metadata area header checksum
Incorrect metadata area header checksum
Volume group "s" inconsistent
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Inconsistent metadata found for VG s - updating to use version
19
Incorrect metadata area header checksum
Automatic metadata correction failed

So, what to do? I didn't have a system level backup (just data), because I hadn't gotten a round to it yet, after converting from Mac OS X. Well, with gentoo Linux, this is a bit devastating, because you have to recompile entirely from source. That can take days, and sometimes it takes months to figure out all the settings you had, if you don't have a backup. To boot, I didn't even have a backup of '/etc/', where the LVM backups are stored, DOH!!! As a result, all the lvm backups were unavailable. This is all really sad, seeing I'm an automatic backup freak, and can't stand it when I don't have system backups.

Now, one could go searching for the beginning of their logical volume, if they wanted to, and recover just that. But, that could be very painful, especially if your LV is not contiguous; which can easily happen if you have multiple drives in your volume group, and you have been resizing your volumes a few times.

Well, as it goes, my brother suggested I boot up with a knoppix CD, to see if I could figure out how to fix the problem. I began running a few different commands. I started with pvs, which displays physical volume information.

root@Microknoppix:~# pvs -v

Scanning for physical volume names
Incorrect metadata area header checksum
Incorrect metadata area header checksum
WARNING: Volume Group s is not consistent
Incorrect metadata area header checksum
Incorrect metadata area header checksum
PV         VG   Fmt  Attr PSize   PFree  DevSize PV UUID
/dev/sda4  s    lvm2 a-   207.58G 88.70G 207.58G 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J
/dev/sdb5  bak  lvm2 a-    32.00G     0   32.00G Vx3gVW-YNoq-xLHt-rOaJ-2HHW-qBcj-nJSfmx
/dev/sdb6  bak  lvm2 a-   266.09G     0  266.09G o7Mi6k-lEsH-ndqb-QMxe-3t3Z-jTWS-qw9KKv

Next I did "vgdisplay -v" dump, and then I ran into pvck by accident. This will display your physical volume information. It happens to display the offsets to all of your LVM metadata backups (GREAT).  I've heard that these are stored in a cycling manner.  So, it may not be worth paying attention to the order they appear.

root@Microknoppix:/mnt/safe/trenta# pvck -d -v /dev/sda4
Scanning /dev/sda4
Incorrect metadata area header checksum
Found label on /dev/sda4, sector 1, type=LVM2 001"
Found text metadata area: offset=4096, size=192512
Found LVM2 metadata record at offset=26624, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=24576, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=22528, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=20480, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=17920, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=15360, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=12800, size=2560, offset2=0 size2=0
Found LVM2 metadata record at offset=10752, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=8704, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=6656, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=4608, size=2048, offset2=0 size2=0
Found LVM2 metadata record at offset=30720, size=165888, offset2=0 size2=0
Found text metadata area: offset=96646856704, size=183296
Incorrect metadata area header checksum

At this point, I'm becoming a little more cheery; I have a chance at recovery, FINALLY; no recompiling my entire system. So, I check knoppix to find out if it has a hex editor, and sure enough, there is an editor called hexedit. So, I ran "hexedit /dev/sda4". I converted the offsets from the pvck output to hex, and went to those offsets by paging down in hexedit. The last record is 30720, which equates to 0x7800. I start highlighting at that offset (Ctrl-Space), and then select the entire file up until the 0x0A0A newline bytes at the end of it. I copy (Esc-W), and then I paste to a file (Esc-Y), and call it /path/lvm-metadata-30720.txt. I do this for everyone that I think might contain information I need. In order to know whether it's relevant data or not, you have to know what the last few LVM changes you made were. Things like sizes of logical volumes, the size of the volume group, what physical volumes existed in the volume group, and things of that nature will all be helpful in recovery. For example, did you double the size of a logical volume, or reduce, or whatever? Then, it's just a matter of diffing each version, to see when the important changes were made. My diff shows...

root@Microknoppix:/mnt/safe/trenta# diff -u lvm-metadata-26624.txt lvm-metadata-28672.txt
--- lvm-metadata-26624.txt      2009-05-25 23:45:11.000000000 +0000
+++ lvm-metadata-28672.txt      2009-05-25 23:41:22.000000000 +0000
@@ -1,6 +1,6 @@
s {
id = "HNHKOr-RpyA-uMdz-tqhN-Z673-L2ej-qXikhF"
-seqno = 18
+seqno = 19
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
@@ -64,7 +64,7 @@

segment1 {
start_extent = 0
-extent_count = 7680
+extent_count = 15360

type = "striped"
stripe_count = 1       # linear
@@ -94,7 +94,7 @@
}
}
}
-# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:42:45 2009
+# Generated by LVM2 version 2.02.36 (2008-04-29): Mon May 25 02:44:25 2009

contents = "Text Format Volume Group"
version = 1
@@ -102,5 +102,5 @@
description = ""

creation_host = "tdamac"       # Linux tdamac 2.6.30-rc3-dirty #23 SMP Mon May 25 01:39:04 MDT 2009 x86_64
-creation_time = 1243240965     # Mon May 25 02:42:45 2009
+creation_time = 1243241065     # Mon May 25 02:44:25 2009

If you look carefully in my diff, you will see the seqno changed from 18 to 19, and the size of the "segment" doubled; that is what I had just done as part of my logical volume resize. The "vgchange -ay" command previously tried to restore seqno=19, which happened to fail. As it were, it appears that v19 of the meta-data backups is the version I need. So, now it's just a matter of taking all of the data we have dumped, and recreating the LVM information. It may be worth doing a RAW dd dump of your partition before messing with it, but I didn't bother. I was fairly certain that my data was still there, and I knew that restoring a configuration would not wipe out the data, just the LVM meta-data.

So, finally I completely recreated everything by doing the following.  The UUID for the physical volume comes from the meta data file.  It's important that you choose the correct ID, or it won't match the LVM data for the vgcfgrestore.

root@Microknoppix:~# pvcreate -ff -u 8cYXSr-l35B-2HBg-V7YS-TWsb-rZ8L-C5EC7J \
  --restorefile lvm-metadata-28672.txt /dev/sda4
root@Microknoppix:~# vgcfgrestore -f /mnt/safe/trenta/lvm-metadata-28672.txt -v s

I am pretty certain that only the second command was needed, because the physical volume meta-data was fine, it was the volume group that was messed up. But, I was hacking my way through, so I issued both commands.

WARNING, WARNING, WARNING, if you do this wrong, and use the wrong meta-data information, you may overwrite some file data somewhere else on the disk.  This can happen when the extents and sizes are off, so the vgcfgrestore command restores the meta-data to the wrong parts of the disk.  Be sure you are able to pick the correct backup data to restore, or you very well may loose data.

Now, after all of this, I think it may have been as easying as doing the following, but I am unsure.  I think what would happen with the following command, is that it would ignore the locking failure, and restore the configuration regardless.

vgchange -ay --ignorelockingfailure

Either way, this information is useful in the event that you loose meta-data due to bad blocks or whatever.

 

  • how to repair a logical volume
  • how to repair bad configured lvm
  • incorrect metadata area header checksum on
Filed under: Linux, LVM Leave a comment
Comments (24) Trackbacks (0)
  1. This happened to me just now…thanks for the blog post – it was very helpful.

    I found I *did* have to re-create the pv with pvcreate -ff – in fact the pv’s vg metadata was the problem, it seems.

    your “vgchange” command suggestion at the very end was tried but didn’t help.

    Thanks again for a very timely post.

  2. > your “vgchange” command suggestion at the very end
    > was tried but didn’t help.

    Oh, thanks for that, I wasn’t sure if that vgchange thing would work or not. I just happened to notice that in the man page after the fact.

    And you’re very welcome. After 2-4 hours of being very upset, I figured I should try and spare others the “pain”, LOL.

    Oh, and I noticed my pvcreate and vgcfgrestore commands were referring to different files, hehe. That would be bad if someone tried that. I will correct it.

  3. in my case offset of the last record does not really point to metadata and the offset of the previous record points to nonexistant place on the disk. When I try with sdb, size of the last record does not match. I just lost more than 600GB because something in hackintosh beta messed up my arrays. I feel, like I will kill somebody for that. It’s so frustrating, that words can’t describe this experience.

  4. (if you ever try hackintosh, PHYSICALLY UNPLUG ALL YOUR HARD DRIVES WITH DATA, or it can silently cross you and ruin them right after it boots up to the instalator)

  5. 600G, ouch, that is very painful 🙁

    Don’t give up though, open the disk with a hexeditor, and see if you can find the configurations near the beginning of the disk.

    • Thanks a lot for this post. It helped a lot. I was able to recover my lost LVM meta.

      Just to add, by the way, you can do cat to your disk like cat /dev/sdb3 and search for your text meta data.

  6. You, sir, are my hero. Thanks to your great blog I’ve managed to get back 3.6TB of precious data. Thank you. It worked like a charm.

  7. Thanks, you saved me too 🙂

  8. Hi I need some help to restore data from hardware based clones/snapshots. I am using the following command in rhel to change the uuid of the clone: pvchange –uuid /dev/mapper/mpath2 –config ‘global{activation=0}’ . I want to know if there something similar in SLES 9 onwards for deactivating device-mapper interaction and change the LVM metadata? Thanks, Salah.

  9. Salah,

    I have no idea actually. But, you could get out a disk hex editor, and do it that way. 😀

  10. Great article, Adams… helped me a lot in recovering a screwed-up PV here, aparently by the same cause (resizing the PV with pvresize and/or resizing the VG with “vgchange -s”). I didn’t have hexedit, so I dumped the whole PV metadata using dd bs=1 skip=N size=M lvm.cfg(where N and M are respectively the offset and the size shown by pvck) and then editing the resulting lvm.cfg to select the
    valid metadata to restore. Also, I had to use BOTH commands in the end (pvcreate AND vgcfgrestore),
    as vgcfgrestore alone complained about metadata checksum error and refused to run. Hope this additional info is useful for someone else.

  11. YOU SAVED MY LIFE!
    well you know when you’re trying a better distro for your girlfriend’s notebook and the pclinuxos’ installer screws up your lvm configuration ,where there are all those pics, songs, movies, files she wants to keep, with just 1, ONE, click?
    and what if you don’t know how to fix the mess?
    now that is fixed, thanks to you, i’ve to talk to those brilliant minds who made that wonderful installer…
    good nite…

  12. Brilliant article! Saved a lot of sweat (éven after sweating a ton). What I did was
    1. Get Ubuntu Live CD .iso with lvm support (the alternative version)
    2. Boot VMWare player with it (mount to cd/dvd) and f2 to bios and make it boot
    in fix mode
    3. pvck -v -d /dev/sda5 to get the offset to metadatas
    4. dd if=/dev/sda5 of=/metadata*.txt bs=1 skip= count=
    5. repeat step 4. for each metadata
    6. explore metadata with nano to see what went wrong and which was last working one
    7. vgcfgrestore -f /metadata.txt -v

    I couldn’t use hexdump so dd worked for me.
    The pvcreate didn’t work for me but apparently it wasn’t necessary.

    Big kiitos from Finland!

  13. Thanks a bunch. Really saved the day.

  14. This was exactly what I needed. Like the post above — pvcreate didn’t work for me – it said it couldn’t lock /dev/sda1 exclusively — after using hexedit to find metadata — vcfgrestore -f file -v vgname —– pvs -v now shows the proper information and after reboot I’m back in business.

    thank you very much.

  15. Hi and thanks for your post.
    I’ve followed the instructions, using a Knoppix live CD to recover my 1 To HDD. It went well, I could read the files, but knoppix gets very slow, and I wasn’t able
    I then reboot Knoppix, to try and copy one file on a usb drive, but now I CAN’T SEE my disk even using “fdisk -l”.

    More Details:
    In deed, my HDD is a 1 To HDD from a NAS iomega ix2 Storcenter, configure in RAID 1 with another 1 To HDD. One of the disk went down and we were not able to access the files over the network interface.
    1-I took the working HDD and connected it to a laptop using SATA-to USB
    2-I first tried to acces the HDD in degraded RAID 1, but it keeps telling “unkown file system linux_RAID_member”
    3-The “file -s” command allow me to notice that it was a Linux LVM device.
    4- I then change the device ID from Linux (ID 83) to Linux LVM (ID 8e)
    5- Then I follow your instruction to restore the metadata and VG configuration

    It all works fine till the problem mentionned above.

    ANY SUGGESTIONS? Thoughts?

  16. Well really nice (and genious) post. I’ve seen that too late, but anyways I wouldn’t have had the time to dump all my existing Data to another storrage (and didn’t have enough spare-storrage where to copy the data) but I’ll remember it for the next time.
    XenServer doesn’t make backup configs (/etc/lvm/archive) of your lvm and if you make a mistake… due to a hand of mistakes and crashed backups we lost 14 days of data (happily over christmas where not much ppl. have worked here). The night nor the days after where very funny.

    Regards

  17. Another system’s data saved! Thank you!

  18. Accidently deleted a Logical Volume in the Debian Installer. Chose storage/datastore instead of vgroot/datastore, no questions asked, simply deleted. Live System. No Backups, no Archives.

    Well Sir, thank you very much. You might not know how many tears you have saved on the internet with this post.

  19. YES! Thank you – I just had the same problem. I think it’s an issue with me multibooting multiple distributions and inconsistent use of lvmetad. I’d seen a few invalid “missing pv” warnings lately but things worked fine. Suddenly during a reboot, I see “removing these pesky missing PVs – you won’t be needing them right?” Funny as it included PVs I’d safely removed about a month ago. And a blank screen as it couldn’t figure out my vol group. Unfortunately I had a lot more PVs and developed a quick script to sort out all my backups. No hex editor or binary calculators are necessary with head -c/tail -c:

    # pvck -d -v /dev/sda2 > backups.txt

    while IFS=$',' read -r offset size ignored
    do
    echo ${offset:34}
    echo ${size:4}
    echo 'tail -c +$offset /dev/sda2 | head -c $size > $offset.txt'
    done save_backupus.sh

    # Should have a script to save all backups as ./$offset.txt. Inspect it and run.
    # Your spacing may vary.

    bash ./save_backups.sh

    # END
    # This should extract all backups from your raw device using tail/head.
    # Then it's easier to dig for the correct one.

  20. Thanks, just thanks. I found a disk from 2009 with data I’d long given up hope of recovering, I got the lot back!

  21. Thanks for this discussion – reading this let me know that recovery was a possibility, thus saving my weekend. Much appreciated.


Leave a comment

 

No trackbacks yet.