Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 7 Months, 20 DaysFacility Status: Fully Operational & Accepting New Cases

Enterprise Virtualization Recovery

Proxmox VE Data Recovery

We recover KVM virtual machines and LXC containers from failed ZFS pools, degraded Ceph clusters, and corrupted Proxmox storage backends. qcow2, raw, and zvol extraction. Free evaluation. No data = no charge.

Author01/10
Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated June 2026
13 min read
Overview02/10

How Proxmox VE Storage Fails and How We Recover It

Proxmox VE stores virtual machines and containers on pluggable storage backends: local ZFS, Ceph, LVM-thin, NFS, or directory-based storage. When the underlying disks fail, Proxmox loses access to the backend and all VMs go offline. Recovery requires imaging the physical drives, reconstructing the storage layer, and extracting each VM's disk image individually.

Proxmox VE stores virtual machines and containers on pluggable storage backends: local ZFS, Ceph (distributed), LVM-thin, NFS, or directory-based storage. When the underlying disks fail, Proxmox loses access to the storage backend and all VMs/containers on it go offline. Recovery requires imaging the physical drives, reconstructing the storage layer (ZFS pool, Ceph object store, or LVM thin pool), and extracting each VM's disk image individually.

Proxmox is increasingly popular for homelab, SMB, and enterprise deployments because it provides KVM virtualization and LXC containers on a Debian Linux base with a web GUI and no license fees. The storage flexibility is a strength for deployment but adds complexity to recovery: a Proxmox cluster might use ZFS on one node, Ceph across the cluster, and NFS for backups. Each backend has different on-disk structures and failure modes.

ZFS Pool Failures03/10

ZFS Pool Failures on Proxmox

ZFS is the default recommended storage backend for Proxmox local storage. Proxmox stores VM disk images as zvols and LXC containers as ZFS datasets. RAIDZ1 tolerates one drive failure per vdev; RAIDZ2 tolerates two. A second or third failure causes pool FAULTED state. We image all drives, reconstruct the vdev geometry from ZFS labels, and force-import the pool.

ZFS is the default recommended storage backend for Proxmox local storage. Proxmox creates ZFS pools during installation and stores VM disk images as zvols (block devices) and LXC containers as ZFS datasets. For detailed ZFS pool recovery procedures, see our ZFS pool recovery guide.

RAIDZ1/RAIDZ2 Vdev Failures

  • RAIDZ1 tolerates one drive failure per vdev; a second failure causes pool FAULTED state and ZFS refuses to import
  • RAIDZ2 tolerates two failures per vdev but a third renders the vdev unrecoverable through normal ZFS tools
  • ZFS stores metadata (uberblock, spacemap, dnode) redundantly by default; data blocks follow the vdev redundancy level
  • We image all drives including failed ones, reconstruct the vdev geometry from ZFS labels (at sectors 0 and end-of-disk), and force-import the pool from images

ZFS Mirror Failures

  • Proxmox mirrors store identical copies on two (or more) drives; losing all mirrors in a vdev causes pool failure
  • Mirror vdevs are simpler to reconstruct: each drive is a standalone copy of the data; we image the healthiest mirror member first
  • If one mirror has bad sectors, we combine data from both mirrors at the block level to produce a complete image
  • Boot drives (Proxmox OS) are typically on a separate ZFS mirror; if only the boot mirror fails, VM data on the storage pool is unaffected

If your Proxmox node shows pool imported with errors or refuses to import entirely with I/O errors, see our ZFS pool import I/O error page for the specific failure pattern and recovery approach.

Ceph Cluster Recovery04/10

Ceph Cluster Recovery on Proxmox

Proxmox integrates Ceph for distributed storage across cluster nodes. Ceph splits VM disk images into 4MB objects and distributes them across OSDs using the CRUSH placement algorithm. When enough OSDs fail that placement groups lose all replicas, the affected RBD images become inaccessible. We image the failed OSD drives and reconstruct the object-to-PG-to-RBD mapping.

Proxmox integrates Ceph for distributed storage across cluster nodes. Ceph splits VM disk images (RBD) into 4MB objects and distributes them across OSDs using the CRUSH placement algorithm. When enough OSDs fail that placement groups (PGs) lose all replicas, the affected RBD images become inaccessible.

OSD Failure and PG Recovery

Each OSD manages objects on a local disk (typically a dedicated SSD or HDD per OSD). Ceph uses BlueStore as its default backend on Proxmox 5.x and later, storing data directly on the block device with a RocksDB metadata database on a small partition. When an OSD disk fails, Ceph marks its PGs as degraded and begins replicating data to other OSDs. If the cluster does not have enough surviving replicas to recover, PGs are marked "incomplete" or "unfound."

We image the failed OSD drives, parse the BlueStore on-disk format (or FileStore for older clusters) including the RocksDB metadata, and reconstruct the object-to-PG mapping. Combined with the CRUSH map (stored in the Ceph monitor database on the mon nodes), we can determine which objects belong to which RBD image and reassemble the virtual disks.

Monitor Database Corruption

Ceph monitors (mon) maintain the cluster map, including the CRUSH map, OSD map, and PG map. Proxmox runs monitors on each cluster node by default. If a majority of monitors lose their database (stored as a LevelDB or RocksDB instance), the cluster cannot form a quorum and all storage access stops. We extract the monitor database from each node's mon data directory and reconstruct the cluster map from the most recent consistent copy.

Lvm-thin And Qcow2 Disk05/10

LVM-Thin and qcow2 Disk Image Recovery

Proxmox supports LVM-thin as a storage backend for VMs that don't require ZFS or Ceph. LVM-thin uses a thin provisioning pool on an LVM logical volume, where each VM gets a thin LV. If the thin pool metadata LV is corrupted, the entire thin pool becomes inaccessible. We parse the thin pool superblock and space maps to locate each thin LV's block mapping.

Proxmox supports LVM-thin as a storage backend for VMs that do not require ZFS checksumming or Ceph distribution. LVM-thin uses a thin provisioning pool on an LVM logical volume, where each VM gets a thin LV.

LVM-Thin Pool Corruption

If the thin pool metadata LV is corrupted (power loss during metadata commit), the entire thin pool becomes inaccessible. We parse the thin pool superblock and space maps from the raw disk image to locate each thin LV's block mapping.

qcow2 Header Corruption

Proxmox uses qcow2 format on directory-based and NFS storage. qcow2 files have a header, L1/L2 reference count tables, and data clusters. If the header or refcount table is corrupted, qemu-img check may fail to repair it. We rebuild the L1/L2 tables from the data cluster layout.

LXC Container Rootfs

LXC containers store their rootfs as a directory, ZFS dataset, or thin LV depending on the storage backend. Recovery extracts the container rootfs from whichever backend was in use. ZFS datasets are extracted as part of the pool reconstruction; LVM-thin LVs are extracted from the thin pool metadata.

Methodology06/10

Recovery Methodology for IT Administrators

Every drive in the Proxmox node is imaged through PC-3000 with write-blocking. For ZFS, we read ZFS labels to determine pool geometry and import the pool read-only from images. For Ceph, we parse BlueStore structures and rebuild the object-to-RBD mapping. For LVM-thin, we parse thin pool metadata to recover the block allocation map for each thin LV.

If you are evaluating our capability to handle Proxmox environments, this is the procedure.

  1. Drive Imaging

    Every drive in the Proxmox node (or cluster, if Ceph) is imaged through PC-3000 with write-blocking. For drives with bad sectors, we use head maps to capture healthy sectors first, then revisit damaged areas with aggressive retry parameters. NVMe drives used as ZFS SLOG or Ceph journal/WAL devices are imaged through PCIe adapters.

  2. Storage Backend Reconstruction

    For ZFS: we read ZFS labels from each drive image to determine pool geometry (mirror, RAIDZ1/2/3), reconstruct the vdev layout, and import the pool read-only from the images. For Ceph: we parse BlueStore on-disk structures from each OSD image, extract the CRUSH map from the monitor database, and rebuild the object-to-PG-to-RBD mapping. For LVM-thin: we parse the thin pool metadata device to recover the block allocation map for each thin LV.

  3. VM and Container Extraction

    KVM VM disk images (qcow2 or raw) are extracted from the reconstructed storage backend. For qcow2 files with backing files (linked clones), we resolve the backing chain and consolidate into a standalone image. LXC container rootfs directories or datasets are extracted as tar archives. Each recovered VM/container is verified by mounting the guest filesystem read-only and checking integrity.

Proxmox Backup Server07/10

How Do You Recover a Proxmox Backup Server Datastore?

A PBS datastore is a content-addressed chunk-store: deduplicated chunks live under .chunks/, and per-snapshot index files map each backup to its chunks. We image the datastore drives, then rebuild the index-to-chunk mapping so snapshots reassemble into VM images and file archives.

A Proxmox Backup Server datastore is a content-addressed chunk-store, not array redundancy. PBS deduplicates at the application layer: each chunk is named by its SHA-256 checksum and split into directories under .chunks/ using a 4-hex-digit (2-byte) prefix of that checksum.

Per-snapshot index files, .fidx for fixed-size chunks of VM disk images (around 4 MiB each) and .didx for the dynamically sized chunks of pxar file archives, map a snapshot back to the chunks it references. A backup repository is a discrete store; it does not give you a degraded-array fallback the way pool redundancy does.

Missing or Corrupt Chunk

Each chunk carries a SHA-256 name and a trailing CRC-32, so a missing or corrupt chunk breaks the chain of trust for whatever references it. If the manifest and index files survive, only the specific .fidx blocks or .didx files that point at that chunk are affected, not the whole snapshot. We carve the readable chunks and rebuild the index references around the gap.

Index and Catalog Desync

When the .fidx or .didx files corrupt, the snapshot desyncs from the chunk-store and the deduplicated blobs can no longer be reassembled into contiguous VM images or pxar archives. The per-snapshot catalog index is separate: when it corrupts, browsing a restore fails even though the chunks still exist. We reconstruct the mapping from the surviving index structures rather than re-running a verify on the original media.

PBS does not keep a RAM-resident deduplication table the way ZFS does. The chunk directories are preallocated and the index mappings are on-disk, so a PBS datastore does not need a multi-gigabyte dedup table in memory to be read.

If the backing store underneath PBS is ZFS with ZFS deduplication enabled, the ZFS DDT RAM rule applies to that ZFS layer (roughly 5 GB of RAM per 1 TB of deduplicated data, and an oversized DDT hangs zpool import and kernel-panics the host). Running ZFS dedup beneath PBS is redundant: PBS already deduplicates above it, so the only thing the ZFS DDT buys you is the RAM exhaustion.

Every datastore drive is block-level imaged through PC-3000 Portable III before any reconstruction. We never delete index files, re-run a verify, or repair on the original media; deleting a .fidx or .didx file orphans the chunk-store and turns a recoverable snapshot into a manual chunk-carving job. The imaging hardware reads the sectors; the index-to-chunk reassembly runs in software against the image files.

DRBD Split-Brain08/10

How Do You Resolve a DRBD Split-Brain Without Losing Data?

We image both DRBD nodes before touching either one. Reconciling a split-brain forces one node to discard its diverged block history, so we preserve that discarded timeline off the original media and carve it separately rather than letting DRBD overwrite it.

DRBD is block-level network replication, and split-brain branches the block-level timeline. DRBD mirrors a block device across two nodes in single-primary or dual-primary resource roles. A split-brain happens when replication breaks and both nodes accept writes independently while disconnected, so each node now holds a different version of the same blocks. The userspace tool drbdadm status is how you identify that the nodes have diverged into a split-brain condition.

Why Reconciliation Is Destructive

Resolving a split-brain is inherently destructive: DRBD forces one node to discard its diverged block history and resync from the chosen authoritative node. Any write made to the discarded node during the split is permanently destroyed once the resync runs. That is why we image both nodes through PC-3000 Portable III first, so the timeline DRBD is about to throw away is preserved off-media and can be carved on its own.

Forum Advice That Destroys Data

Do not force primary on both nodes; that tells DRBD to overwrite the block differences and destroys the divergent timeline outright. The same applies to the PBS pattern of "just re-run the backup" or "delete the index and re-verify," which orphans the chunk-store. None of these run on original media in our workflow. We make the destructive decision against image files, after both sides are preserved.

The recovery order is fixed: image both nodes, identify the diverged condition with drbdadm status, determine which node holds the writes worth keeping, then reconstruct from the images. DRBD replication is uptime, not backup; a split-brain proves it, because both replicas can end up holding partial, conflicting copies of the same production data at the same time.

Pricing09/10

Proxmox VE Recovery Pricing

Proxmox VE recovery costs per-drive imaging based on each drive's condition, plus a reconstruction fee covering ZFS pool import, Ceph object reassembly, or LVM-thin parsing. The per-drive fee depends on whether the drive needs logical imaging, firmware repair, or a mechanical head swap. No data recovered means no charge.

Same transparent model as every other service: per-drive imaging based on each drive's condition, plus a $400-$800 reconstruction fee covering ZFS pool import, Ceph object reassembly, or LVM-thin parsing. No data recovered means no charge.

Service TierPrice Range (Per Drive)Description
Logical / Firmware Imaging$250-$900Firmware module damage, SMART threshold failures, or filesystem corruption on individual drives.
Mechanical (Head Swap / Motor)$1,200-$1,50050% depositDonor parts consumed during transplant. SAS drives require SAS-specific donors.
Storage Reconstruction + Extraction$400-$800per storage backendZFS pool import, Ceph object reassembly, or LVM-thin parsing. Includes VM/container extraction.

No Data = No Charge: If we recover nothing from your Proxmox environment, you owe $0. Free evaluation, no obligation.

Every Proxmox case is handled in-house at the Austin, TX lab. Single location, no outsourcing, no diagnostic fees. The Proxmox recovery path differs from VMware VMFS and Hyper-V: ZFS pools, Ceph object stores, LVM-thin pools, PBS chunk-stores, and DRBD replicas each carry their own on-disk structures, so we reconstruct the specific backend rather than running a generic VM-extraction pass.

Faq10/10

Proxmox VE Recovery; Common Questions

Can you recover a degraded ZFS pool on Proxmox VE?
Yes. Proxmox commonly uses ZFS mirrors or RAIDZ1/RAIDZ2 for local VM storage. When enough drives in a vdev fail, Proxmox marks the pool as DEGRADED or FAULTED and may refuse to import it. We image all member drives through PC-3000, reconstruct the ZFS pool from images, and extract the VM disk images (zvols or raw files) stored on the pool. The original drives are never modified.
How do you recover from a Ceph OSD failure in a Proxmox cluster?
Ceph distributes VM disk objects across OSDs using a CRUSH placement map. When enough OSDs fail that placement groups lose all replicas, the Ceph cluster marks those PGs as 'incomplete' and the associated RBD images become inaccessible. We image the OSD drives from each affected node, parse the OSD LevelDB/RocksDB metadata to determine object placement, and reconstruct the RBD images for each VM.
Can you recover LXC containers separately from KVM VMs?
Yes. Proxmox stores LXC container rootfs as directories or ZFS datasets on the underlying storage backend. KVM VMs use qcow2 or raw disk images on the same storage. Both are recoverable from the same pool reconstruction. We extract each container's rootfs and each VM's disk image individually, regardless of whether the storage backend is ZFS, LVM-thin, or Ceph.
My vzdump backup file is corrupted. Can you extract data from it?
vzdump creates LZO, gzip, or zstd-compressed tar archives containing the VM configuration and disk image. If the archive header is intact but the compressed stream has errors, we decompress up to the corruption point and extract whatever data is recoverable. For raw-format vzdump backups, the disk image can be extracted directly even if the tar metadata is damaged.
Can you restore from a Proxmox Backup Server datastore with corrupted chunks?
Often yes. A PBS datastore is a content-addressed chunk-store, so a missing or corrupt chunk under .chunks/ only affects the specific .fidx blocks or .didx files that reference it, not the whole snapshot, as long as the manifest and index files survive. If the .fidx or .didx index files themselves corrupt, the snapshot desyncs from the chunk-store and the deduplicated blobs can no longer be reassembled into contiguous VM images or pxar archives; catalog index corruption blocks browsing a restore even when the chunks exist. We image the datastore drives through PC-3000 Portable III first, then rebuild the index-to-chunk mapping in software around the damaged areas.
How do you resolve a DRBD split-brain without losing data?
We image both DRBD nodes through PC-3000 Portable III before touching either one. Split-brain means both nodes accepted writes independently while disconnected, so each holds a different version of the same blocks. We use drbdadm status to confirm the diverged condition and determine which node holds the writes worth keeping. Reconciliation is inherently destructive: DRBD forces the other node to discard its diverged block history during resync, permanently destroying any writes made to it during the split. Imaging both sides first preserves the discarded timeline off-media so it can be carved separately.
Should I enable ZFS deduplication under Proxmox Backup Server?
No. PBS already deduplicates at the application layer through its content-addressed chunk-store, so ZFS deduplication underneath it is redundant. ZFS dedup also requires roughly 5 GB of RAM per 1 TB of deduplicated data for the DDT, and an oversized DDT hangs zpool import and kernel-panics the host. Running it beneath PBS buys you the RAM exhaustion with no additional space saving.

Ready to recover your Proxmox environment?

Free evaluation. No data = no charge. Mail-in from anywhere in the U.S.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews