LVM Metadata Corruption Recovery

LVM2 Architecture02/11

How is LVM2 metadata stored on disk?

LVM2 operates through the Linux Device Mapper (DM) framework, mapping physical disk sectors to logical extents via three abstraction layers: Physical Volumes (PV), Volume Groups (VG), and Logical Volumes (LV). Understanding where each metadata structure resides on disk is the prerequisite for any recovery.

Layer	On-Disk Location	Recovery Implications
PV Label	Sector 1 (bytes 512-1023)	Contains the `LABELONE` magic string, PV UUID, and pointer to the VGDA. Overwriting sector 1 (via accidental pvcreate, fdisk, or mkfs) destroys the entry point to the entire LVM tree. The user data area is unaffected.
VGDA (Ring Buffer)	Typically sectors 8+	Human-readable ASCII text describing the VG name, UUID, PE size (default 4MB), and the complete LV-to-PE mapping. Stored in a circular ring buffer; each VG modification writes a new copy and advances the pointer. Older copies survive in the buffer until overwritten by subsequent changes.
User Data Area	After PE alignment boundary	Physical extents (default 4MB each) containing the actual filesystem data. This region is untouched by PV label or VGDA corruption. The data is recoverable if the extent mapping can be reconstructed from any surviving VGDA copy.

Key detail: The VGDA is plain ASCII text, not binary. A volume group descriptor for a 4-drive, 3-LV configuration is approximately 2-4KB of readable text. You can extract it with strings or hexdump -C and read the PV UUIDs, LV names, segment mappings, and extent allocations directly. This makes LVM metadata one of the more recoverable structures in the Linux storage stack, provided the ring buffer region has not been fully overwritten.

Causes of Metadata Corruption03/11

What destroys LVM metadata?

LVM metadata corruption is rarely spontaneous on healthy hardware. It results from administrative errors, hardware failures in the metadata region, or interactions between stacked storage layers.

1.Accidental pvcreate on the wrong device. pvcreate writes a new PV label at sector 1 and a blank VGDA. If the target device already contained an LVM physical volume, the original VG descriptor is overwritten. The user data extents remain intact; only the mapping metadata is lost.
2.Partition table overwrites. Running fdisk, parted, or a NAS initialization wizard on a PV member. GPT writes a backup partition table at the end of the disk and a primary table at the beginning. If the primary GPT header overlaps sector 1, the PV label is destroyed. MBR partition tables occupy only sector 0 and leave the LVM label intact, but some tools write additional metadata beyond sector 0.
3.Bad sectors in the metadata region. On HDDs, media degradation in sectors 1-16 (where the PV label and VGDA reside) produces I/O errors when LVM attempts to read the descriptor. The drive SMART log shows reallocated or pending sectors. The user data area may be fully intact while the small metadata region is physically unreadable.
4.NVMe FTL corruption. On NVMe SSDs, the Flash Translation Layer maps logical block addresses to physical NAND pages. A power loss during FTL journal commit can cause the controller to return UNC errors or stale data for the sectors containing the LVM metadata. The OS reports the same "volume group not found" error, but the root cause is hardware, not logical. Running vgcfgrestore on a drive with FTL corruption may write to an unmapped or mis-mapped NAND page.
5.Stacked layer desynchronization. On systems where LVM sits on top of mdadm software RAID, a RAID rebuild or resync can alter the byte layout that LVM expects. Synology SHR and QNAP QTS both layer LVM over mdadm. An interrupted RAID reshape or failed drive replacement that triggers a partial resync can leave the mdadm array intact while the LVM metadata region contains stale or torn writes.
6.USB enclosure sector translation. External drives connected via USB bridge boards (ASMedia ASM1153E, JMicron JMS578) can perform 512-byte to 4K-sector translation. The PV label at logical sector 1 may shift to a different physical offset when the drive is connected directly via SATA. This produces a false "metadata missing" error. The metadata is intact; the block addressing has changed.

Dangerous Commands04/11

Which commands destroy LVM volumes during recovery?

The standard sysadmin response to a missing volume group is to attempt restoration via LVM tools. Several commonly suggested commands convert a recoverable metadata loss into permanent data destruction.

✕pvcreate on an existing PV member. pvcreate generates a new PV UUID and writes a blank VGDA. The original volume group descriptor (including all LV segment mappings) is overwritten. If the ring buffer was small enough that pvcreate's blank descriptor spans the entire VGDA region, no prior copy survives for hex-level extraction.
✕vgcfgrestore with an incorrect .vg file. Restoring a VG descriptor from the wrong timestamp (e.g., before an lvextend operation) maps logical extents to physical extents that no longer match the on-disk layout. The filesystem will mount with missing or corrupt files because the extent map points to the wrong physical locations.
✕mkfs on the raw PV device. Running mkfs.ext4 or mkfs.xfs directly on the physical volume device writes filesystem superblocks and inode tables across the entire block device, overwriting both the LVM metadata and the user data extents. This is irreversible for the overwritten regions.
✕lvremove on an SSD with issue_discards enabled. If issue_discards = 1 is set in /etc/lvm/lvm.conf, lvremove sends TRIM/UNMAP commands to the SSD controller. The controller invalidates the FTL mapping for those blocks. Subsequent reads return zeroes. The physical NAND charge may still exist, but the controller will not serve it. Recovery is not possible after TRIM executes.

Before running any LVM command on a missing volume group: Image every physical volume member to a separate storage target using write-blocked connections. All recovery attempts must operate on images, not original media. If any underlying drive has physical faults, the imaging step captures recoverable sectors before the drive condition worsens.

Standard LVM vs LVM-Thin05/11

How does standard LVM differ from LVM-thin provisioning?

Standard LVM and LVM-thin provisioning use fundamentally different metadata structures. The recovery approach for each is distinct, and applying the wrong technique will fail silently or cause data loss.

Property	Standard LVM	LVM-Thin (dm-thin)
Metadata Format	ASCII text in VGDA ring buffer	Binary B-tree in a hidden metadata LV
Recovery Tool	vgcfgrestore	thin_dump / thin_repair / thin_restore
Allocation	Static: PEs mapped to LEs at creation time	Dynamic: virtual blocks allocated on first write
Snapshot Metadata	COW exception table (simple, linear)	Shared B-tree with reference counts per block
Corruption Impact	LV mapping lost; data extents survive	Entire thin pool offline; all thin LVs inaccessible
Common Platforms	RHEL, CentOS, Synology SHR, QNAP QTS	Proxmox VE, oVirt, enterprise KVM hypervisors

vgcfgrestore does not fix thin pool metadata. The thin pool's binary B-tree is stored inside a hidden logical volume, not in the VGDA. Restoring the VG descriptor via vgcfgrestore re-creates the thin pool LV definitions but does not touch the B-tree contents. If the B-tree is corrupted, the thin pool remains in needs_check state after VG restoration. You must use thin_dump to export the B-tree to XML, repair it, and thin_restore to write it back.

Proxmox LVM-Thin06/11

What causes Proxmox VE LVM-thin metadata corruption?

Proxmox VE uses LVM-thin provisioning as its default local storage backend. Each VM disk is a thin logical volume. The thin pool metadata LV contains the block-level allocation map for every VM on the host. When this metadata LV corrupts, every VM on the storage pool becomes inaccessible simultaneously.

1.Power loss during metadata commit. The dm-thin target writes metadata in transactions. A power failure mid-transaction leaves the B-tree in an inconsistent state. Proxmox boots with the thin pool in read-only mode or refuses to activate it. The lvs -a output shows the thin pool with the "needs_check" attribute.
2.Thin pool metadata overflow. When the metadata LV runs out of space (common when snapshots accumulate without pruning), dm-thin cannot allocate new mapping entries. Writes to any thin LV begin failing with I/O errors. If the host is not shut down cleanly at this point, the metadata B-tree can record partial allocations that leave orphaned blocks.
3.Array expansion truncating the metadata LV. Expanding the underlying physical storage (e.g., replacing drives in a hardware RAID and growing the PV) can inadvertently truncate the thin metadata LV if the LVM tools recalculate extent boundaries. The binary B-tree is severed mid-structure, and standard thin_check reports fatal errors.

Recovery approach: Image the entire block device. Extract the thin metadata LV contents by calculating its PE range from the VG descriptor. Run thin_dump to export the B-tree to XML. Identify broken transaction IDs or orphaned block references in the XML. Use thin_repair to rebuild the B-tree from the last consistent transaction. Restore via thin_restore to the metadata device on the imaged clone. If thin_dump itself fails with I/O errors, the metadata LV contains physical bad blocks and requires sector-level imaging via PC-3000 before any logical repair can proceed.

Snapshot Chain Corruption07/11

How does LVM snapshot chain corruption spread?

LVM snapshots (both standard COW and thin snapshots) add another metadata dependency layer. When snapshot metadata corrupts, the corruption can cascade to the origin volume.

1.Standard LVM snapshots (COW exception table): Standard snapshots store changed blocks in a COW (Copy-on-Write) exception table. When the snapshot volume fills completely, the kernel invalidates the snapshot and marks it as "Snapshot invalid." The origin volume remains accessible, but any blocks that were in transit during the overflow event may contain inconsistent data if a write was interrupted mid-COW.
2.Thin snapshots (shared B-tree references): Thin snapshots share physical blocks with the origin via reference counting in the thin pool B-tree. A corrupted reference count can cause the thin pool to report impossible block states (negative references or blocks claimed by nonexistent thin LVs). When thin_check detects these inconsistencies, it places the pool in read-only mode. Deleting the corrupt snapshot without first repairing the B-tree can decrement reference counts for blocks still used by the origin, causing silent data loss on the primary volume.

Do not delete corrupt snapshots without verifying B-tree consistency. On thin pools, use thin_dump to export the B-tree to XML and verify reference counts before removing any snapshot. On standard LVM, invalidated snapshots are safe to remove (the exception table is already discarded by the kernel), but the origin volume should be checked for filesystem consistency before resuming writes.

Recovery Methodology08/11

How do we recover LVM volumes?

Professional LVM recovery separates the physical imaging step from the logical metadata reconstruction. We do not run LVM tools on original media under any circumstances.

1.Image all physical volume members. Each drive is connected via write-blocked interface and cloned sector-by-sector using PC-3000 or DeepSpar Disk Imager. On HDDs with bad sectors in the metadata region (sectors 1-16), PC-3000 selective head imaging and adaptive read parameter adjustment extract the metadata sectors from degraded platters. On NVMe SSDs with FTL corruption, we stabilize the controller via PCIe link negotiation before extracting the LBA range containing the VGDA.
2.Locate the VGDA on each PV image. Scan the imaged block device for the LABELONE magic string at sector 1 to find the PV label. If the label is intact, it points directly to the VGDA offset. If the label is destroyed, we scan outward from sector 8 for ASCII text matching the VG descriptor pattern (VG name, PV UUID, extent mappings). The ring buffer stores multiple historical copies; we extract all of them and compare timestamps to identify the most recent valid version.
3.Cross-reference /etc/lvm/archive if the OS drive is accessible. The archive directory contains timestamped .vg files from every VG metadata change. We compare the archive descriptor to the hex-extracted VGDA to verify consistency. If the archive is more recent, we use it. If the hex-extracted copy is more recent (the archive is on a different disk that was last booted before the corruption event), we use the extracted copy.
4.Reconstruct the VG on the imaged clone. Write the validated descriptor back to the PV image using vgcfgrestore. Activate the VG, scan for logical volumes, and mount each LV read-only. For thin pools, run thin_dump on the metadata LV, repair via thin_repair, and thin_restore before activating the thin LVs.
5.Verify filesystem integrity and extract data. Run filesystem-appropriate consistency checks (e2fsck -n for ext4, xfs_repair -n for XFS) on the mounted LVs. Extract recovered data to external storage. The original drives are returned to the client unmodified; all work was performed on images.

Thin pool repair risk: Running thin_repair on a production metadata device can corrupt the B-tree further if the underlying storage has uncorrectable read errors. The repair tool writes corrected metadata back to the same device; if those writes land on failing sectors, the metadata damage worsens. Safe recovery images all pool member drives first, rebuilds any hardware RAID virtually, extracts the thin metadata LV from its PE range, and runs thin_dump on the cloned image to produce an XML map of all provisioned blocks. Orphaned transaction IDs are corrected in the dump before thin_restore rebuilds the metadata on the clone.

Physical vs Logical09/11

When does LVM corruption mask a hardware failure?

The OS reports the same error message ("Volume group not found") regardless of whether the metadata region has a logical overwrite or a physical read failure. Determining which layer has failed is the first diagnostic step.

HDD: Bad Sectors in the Metadata Region

The PV label occupies a single 512-byte sector. If that sector develops a media defect, the SMART log records a reallocated sector count increment. The firmware may silently redirect reads to a spare sector, or it may return an I/O error if the spare pool is exhausted. PC-3000 reads the defect map to determine whether sector 1 has been remapped or is physically unreadable.

NVMe SSD: FTL Mapping Failure

NVMe controllers are BGA packages that abstract all NAND access behind the Flash Translation Layer. A corrupted FTL journal means the controller cannot resolve LBA-to-NAND mappings for the sectors containing LVM metadata. The drive may report the correct capacity but return UNC errors for specific LBA ranges. Standard Linux tools (pvck, pvs) will report missing metadata. Running vgcfgrestore writes new data to the drive, but the controller may route that write to an incorrect NAND page due to the damaged FTL. Stabilizing the controller firmware and extracting the raw NAND data requires PC-3000 SSD.

SATA SSD: SATAfirm S11 Firmware Lock

Certain Phison-based SATA SSDs (PS3111) are prone to a firmware lock that makes the drive report its model string as "SATAfirm S11." The drive appears in BIOS but shows 0 bytes capacity. The OS cannot read any sector, including the LVM metadata region. This is not LVM corruption; it is a controller firmware crash. Recovery requires PC-3000 SSD to reload the firmware tables and rebuild the FTL mapping before any LVM data becomes accessible.

Pricing Context10/11

LVM Recovery Pricing

LVM metadata recovery pricing depends on whether the underlying storage is physically healthy or requires hardware-level intervention.

Scenario	Price Range	What's Involved
Logical metadata overwrite (healthy hardware)	From $250	VGDA extraction from ring buffer, vgcfgrestore on cloned image, LV mount and data extraction. Single drive, no physical damage.
Multi-drive VG with mdadm RAID substrate	$600–$900	Image all array members, reconstruct mdadm layer, then reconstruct LVM layer on top. Synology SHR and QNAP QTS configurations.
LVM-thin pool metadata repair	Quoted after evaluation	Binary B-tree extraction, thin_dump to XML, transaction ID repair, thin_restore. Proxmox VE and enterprise hypervisor configurations.
Hardware failure + LVM reconstruction	$1,200–$1,500	PC-3000 hardware imaging (head swap, firmware repair, or FTL stabilization), followed by logical LVM reconstruction on the extracted image.

All prices subject to evaluation. No diagnostic fee. No data, no recovery fee. Multi-drive configurations priced per the total number of drives requiring imaging and the complexity of the stacked storage layers.

Faq11/11

Frequently Asked Questions

Is vgcfgrestore safe to run on a failing drive?

Only if the underlying block device is physically healthy. vgcfgrestore writes new metadata to the VGDA sectors on the raw device. If the drive has bad sectors in the metadata region, the write may silently fail or land on a reallocated sector that the firmware later drops. On SSDs with FTL corruption, the controller may not map the write to the intended physical NAND page. Image the drive first; run vgcfgrestore on the image.

I accidentally ran pvcreate on a drive that already had LVM data. Is recovery possible?

In most cases, yes. pvcreate overwrites the LVM label at sector 1 and writes a new VGDA, but the user data area (the physical extents containing your actual files) remains untouched. Recovery requires locating the original VGDA text in the ring buffer region via hex scanning, extracting the volume group descriptor, and manually restoring it. The critical variable is whether the new pvcreate wrote a VGDA large enough to overwrite all copies of the original descriptor in the circular buffer.

My Proxmox thin pool shows 'metadata needs_check'. Can I fix it myself?

The thin_check and thin_repair utilities can fix minor transaction ID inconsistencies. Run thin_dump first to export the binary B-tree metadata to XML, verify the XML is well-formed and contains valid block mappings, then use thin_restore to write corrected metadata back. If thin_dump itself fails with I/O errors, the underlying storage has physical damage, and the binary metadata LV must be imaged at the hardware level before any repair attempt. Do not run thin_repair directly on production storage without a full backup of the metadata device.

My NAS uses both mdadm and LVM. Which layer is broken?

Check bottom-up. Run mdadm --detail on each array member to verify the RAID layer is intact. If mdadm reports a healthy, active array but the volume group is missing, the LVM metadata is the failure point. If mdadm reports missing superblocks, fix the RAID layer first; LVM sits on top of the assembled MD device and cannot function without it. Synology SHR and QNAP QTS both stack LVM over mdadm, so a failure in either layer produces a 'volume crashed' state in the NAS UI.

I deleted a logical volume on an SSD. Can the data be recovered?

If issue_discards = 1 is set in /etc/lvm/lvm.conf (disabled by default in upstream LVM2 and mainstream distributions, so it has to be enabled deliberately), lvremove sends TRIM/UNMAP commands to the SSD controller. The controller invalidates the FTL mapping for those blocks and returns zeroes on subsequent reads, regardless of whether charge remains in the NAND cells. Recovery is not possible after TRIM executes on modern SSDs. If issue_discards was set to 0, the logical volume mapping is gone but the physical data remains on the NAND; recovery requires reconstructing the extent map from the VGDA archive or hex carving.

Where does Linux store LVM metadata backups?

LVM2 automatically archives volume group descriptors to /etc/lvm/archive/ before every metadata change and maintains a current backup in /etc/lvm/backup/. Each archive file is a timestamped ASCII text copy of the full VG descriptor, including all PV UUIDs, LV segment mappings, and extent allocations. If the OS drive is accessible, these files are the fastest path to restoration via vgcfgrestore. If the OS drive is also damaged, the same descriptor text exists in the VGDA ring buffer on every physical volume member.

No Data, No Fee

Guarantee

2.49M+

Subscribers

4.9

1,837+ Google Reviews

Since 2008

Established

Repairs on Video

Full Transparency

As Featured In

BBC News