Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 6 Months, 14 DaysFacility Status: Fully Operational & Accepting New Cases

Virtual Machine Data Recovery

When the physical storage beneath your VMs fails, the hypervisor cannot help you. RAID controller failures, SAN LUN corruption, and drive mechanical failures take entire datastores offline. We image the failed drives, reconstruct the storage array, parse the host filesystem, and extract your virtual disk files with their guest data intact.

All work is performed in-house at our Austin, TX lab using PC-3000 and DeepSpar Disk Imager. No data recovered means no charge.

Author01/14
Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated March 30, 2026
Quick Answer

Virtual machine data recovery extracts VMDK, VHDX, and qcow2 files from failed storage beneath VMware ESXi, Microsoft Hyper-V, and Proxmox VE environments. Drives are imaged write-blocked through PC-3000, the RAID array or SAN LUN is reconstructed offline, the host filesystem is parsed, and virtual disks are extracted with their snapshot chains consolidated back to a consistent state.

Featured Snippet Target02/14

How Does Virtual Machine Data Recovery Work?

Virtual machine data recovery images the failed physical drives write-blocked, reconstructs the RAID array or SAN LUN offline, parses the host filesystem (VMFS, NTFS, or ZFS), and extracts virtual disk container files with snapshot chains consolidated. The process addresses two distinct layers: the physical storage beneath the hypervisor and the virtual disk containers on that filesystem.

Virtual machine recovery is a two-layer problem: the physical storage holding the datastore, and the virtual disk container files stored on that filesystem. A hardware failure at the physical layer makes both layers inaccessible, but virtual disk data is typically intact within the container files once the underlying storage is reconstructed.

The first layer is the physical storage: RAID arrays, SAN LUNs, or standalone drives that hold the datastore. The second layer is the virtual disk container: VMDK, VHDX, qcow2, or raw files stored on the host filesystem.

  1. Isolate and image the physical drives using PC-3000 with sector-by-sector cloning and custom read timeouts to prevent degraded heads from further damaging platters.
  2. Reconstruct the RAID array offline by parsing controller metadata (PERC, Smart Array, LSI, mdadm superblocks, ZFS labels) from drive images. Determine stripe size, parity rotation, and member ordering without touching original hardware.
  3. Parse the host filesystem (VMFS5/6, NTFS, ReFS, ext4, XFS, ZFS) from the reconstructed array to locate the virtual disk container files and their metadata.
  4. Extract the virtual disks and consolidate any snapshot deltas back into the base disk, producing a single flat image representing the VM's last consistent state.
  5. Verify guest filesystem integrity by mounting the recovered virtual disk read-only and confirming NTFS, ext4, or XFS structures are intact.
Hypervisor-Specific Child Pages03/14

Which Hypervisor Platforms Do You Support?

We recover virtual machines from VMware ESXi, Microsoft Hyper-V, and Proxmox VE. Each platform stores virtual disks in a distinct format on a distinct host filesystem; recovery procedures differ at every layer from physical imaging through guest extraction. Dedicated recovery pages cover platform-specific metadata structures and failure modes.

We recover virtual machines from VMware ESXi (VMFS datastores, VMDK containers, vSAN disk groups), Microsoft Hyper-V (VHDX files, checkpoint chains, Cluster Shared Volumes), and Proxmox VE (ZFS zvols, LVM-thin pools, Ceph RBD images). Each platform stores virtual disks in a distinct format on a distinct host filesystem; recovery procedures differ at every layer from physical imaging through guest extraction.

Virtual Disk Format Technical Section04/14

What Virtual Disk Formats Do You Recover?

We recover VMDK (VMware ESXi and Workstation), VHDX (Microsoft Hyper-V), qcow2 (KVM, Proxmox VE, and OpenStack), VDI (VirtualBox), and raw disk images. Each format has a distinct on-disk structure that determines what metadata must survive or be reconstructed for recovery to succeed.

We recover VMDK (VMware ESXi and Workstation), VHDX (Microsoft Hyper-V), qcow2 (KVM, Proxmox VE, and OpenStack), VDI (VirtualBox), and raw disk images. Each format has a distinct on-disk structure; the format determines what metadata must survive or be reconstructed for recovery to succeed. The underlying physical recovery process is the same regardless of container format.

VMDK (VMware ESXi / Workstation)

A VMDK consists of two files: a text-based descriptor file (.vmdk) containing geometry, adapter type, and extent references, and a flat data file (-flat.vmdk) containing the raw virtual disk contents. Monolithic flat VMDKs store everything in a single extent. Split VMDKs fragment the data into 2GB extent files (-s001.vmdk through -sNNN.vmdk).

Snapshot chains add delta disks (-delta.vmdk on VMFS5, -sesparse.vmdk on VMFS6). Each delta contains a grain directory and grain tables mapping changed blocks relative to the parent. The descriptor's CID field must match the parentCID of each child delta.

When an ESXi host crashes during a "Delete All Snapshots" operation, orphaned deltas disconnect from the .vmsd file and the CID chain breaks. We reconstruct the chain by reading grain tables, determining the actual write sequence, and recalculating CID/parentCID values.

If the descriptor file is destroyed entirely, the flat file becomes headless. We locate the flat extent boundaries on the VMFS volume, calculate geometry (cylinders, heads, sectors) from the file size, and rebuild the descriptor manually.

VHDX (Microsoft Hyper-V)

VHDX replaced the legacy VHD format in Windows Server 2012. It supports virtual disks up to 64 TB and uses a structured layout: a file type identifier, two redundant header copies (at 64 KB and 128 KB offsets), a region table pointing to the BAT (Block Allocation Table) and metadata regions, and a replay log for crash consistency.

Fixed VHDX files pre-allocate all blocks at creation. Dynamic VHDX files allocate blocks on demand as the guest writes data. Dynamic disks are vulnerable to BAT corruption if the underlying storage disconnects mid-write, because the BAT update and the payload block write are separate I/O operations.

If the BAT points to an uninitialized block offset, the guest filesystem reads garbage. Hyper-V checkpoints create AVHDX differencing disks with their own BAT mapping changed blocks relative to the parent. A failed checkpoint merge leaves orphaned AVHDX files that the VM configuration (VMCX) no longer tracks. We parse each differencing disk's BAT, determine the correct parent-child ordering by creation timestamp, and consolidate the writes into a single base VHDX.

qcow2 (Proxmox VE / KVM / OpenStack)

qcow2 (QEMU Copy-On-Write version 2) uses a two-level reference table system: L1 entries point to L2 tables, and L2 entries point to data clusters. This indirection allows sparse allocation, internal snapshots, and backing file chains. A separate reference count table tracks cluster usage for copy-on-write operations.

Proxmox VE environments using cache=none skip the host page cache, sending writes directly to the storage backend. If the storage loses power during a metadata commit, the L1/L2 tables and reference counts can become inconsistent. This is a torn write: the data cluster was written but the L2 entry still points to the old location (or to nothing). We scan the qcow2 file for valid cluster boundaries, rebuild the L1/L2 mapping tables from discovered data clusters, and recalculate reference counts.

Backing file chains (used for Proxmox linked clones) add another failure dimension. If the base image is on a different storage backend than the overlay, a failure on either backend breaks the chain. Both the base and overlay must be recovered and reconnected for a complete VM image.

Thick vs Thin Provisioned VMDK: Recovery Differences

Eagerzeroedthick VMDKs pre-allocate the full extent and zero every block at creation, so the flat file's size on the VMFS volume equals the provisioned virtual disk size and recovery requires only the descriptor and the contiguous flat region. Lazyzeroedthick allocates space at creation but defers zeroing until first write, leaving uninitialized regions that read as garbage if recovered naively. Thin provisioned VMDKs only allocate VMFS blocks as the guest writes; the on-disk flat file is sparse, and missing block maps inside the descriptor make recovery dependent on intact VMFS allocation metadata.

VMware exposes three VMDK provisioning modes on VMFS datastores. Each interacts differently with the underlying storage and produces different recovery profiles when the datastore fails or the VMDK descriptor is lost.

Eagerzeroedthick (required for FT, MSCS, vSAN policies)
Allocates and zeros every VMFS block at creation. The flat extent is contiguous and fully initialized. If the descriptor is destroyed, the virtual disk geometry can be rebuilt from the flat file size alone; no sparse map is needed. Recovery is the most straightforward of the three modes.
Lazyzeroedthick (default thick mode)
Reserves VMFS blocks at creation but only zeros them on first guest write. Uninitialized regions read as whatever previously occupied that VMFS block. After a host filesystem failure, naive imaging tools cannot tell the difference between guest-application-written data and stale residue from a deleted VMDK that previously occupied the same region. We mount the recovered guest filesystem read-only and verify file integrity before declaring blocks valid.
Thin provisioned
Allocates VMFS blocks only on first guest write. Reported size on the datastore can be far smaller than provisioned virtual disk size. The descriptor stores an extent allocation map referencing the VMFS resource fork. If the descriptor or the VMFS resource metadata is corrupted, the mapping between virtual block offsets and physical VMFS extents is lost; the flat file becomes a fragmented set of allocated regions whose virtual offsets must be reconstructed by parsing guest filesystem signatures. Over-allocated thin pools that exhaust the underlying datastore also produce write failures that the guest sees as silent corruption.

For RAID-backed datastores, the provisioning mode determines how aggressively we image: thick modes can be cloned with standard sequential imaging; thin modes require sparse-aware imaging that preserves the VMFS allocation map so the on-disk extent layout can be parsed offline.

Ransomware Encrypted Virtual Environments05/14

Can Ransomware-Encrypted Virtual Machines Be Recovered?

Yes, in many cases. Ransomware targeting hypervisors such as LockBit, BlackCat/ALPHV, and Royal typically encrypts only volume headers and virtual disk descriptor files. The flat data payload holding actual guest data often remains intact and can be extracted through write-blocked imaging and hex analysis of encryption boundaries without running any decryptor against a live system.

Yes, in many cases. Modern ransomware targeting hypervisors (LockBit, BlackCat/ALPHV, Royal) targets volume headers & virtual disk descriptor files. The encryption often covers only the first few megabytes of each VMDK or VHDX file; the flat data payload containing the actual guest data remains intact and can be extracted through write-blocked imaging and hex analysis of encryption boundaries.

Running untested decryptors on a live system risks further corruption of the VMFS heartbeat region or the VHDX replay log.

During complex server recovery incidents involving ransomware, we image the entire array write-blocked through PC-3000. Using hex analysis, we identify the exact encryption boundaries within each virtual disk file. The surviving, unencrypted data payload blocks are extracted from the orphaned flat files & reassembled into mountable guest filesystem images.

For dedicated ransomware recovery resources including LockBit & Ryuk-specific procedures, see our ransomware recovery service pages.

SAN-Backed Datastore Failures06/14

How Do You Recover SAN-Backed Datastores?

SAN-backed datastore recovery does not require the SAN controller. Individual SAS or SATA drives are pulled from the shelf and imaged directly. The RAID topology is reconstructed offline from proprietary on-disk metadata (EMC, NetApp WAFL, HP MSA, Dell Compellent) and the VMFS or NTFS filesystem is parsed from the reconstructed LUN to extract virtual disk container files.

When a SAN controller fails or multiple drives in a shelf fail simultaneously, every VM on every datastore hosted by that LUN goes offline. Recovery avoids the failed controller entirely; the on-disk metadata is all we need.

  1. Image individual SAS/SATA drives from the SAN shelf directly, reading independently of the failed LUN controller. For enterprise SAS drives, we use SAS HBAs with PC-3000 to handle non-standard sector sizes.
  2. Reconstruct the RAID topology using the SAN controller's proprietary on-disk metadata (EMC, NetApp WAFL, HP MSA, Dell Compellent). The SAN controller hardware is not needed.
  3. Parse the host filesystem (VMFS, NTFS, or ZFS) on the reconstructed LUN to locate & extract the virtual disk container files.

For SAN environments using SSD caching tiers (read cache or write-back cache), the cache drive must also be imaged. Write-back cache drives may contain committed writes that never reached the capacity tier. Losing the cache drive in this scenario means losing those pending writes permanently.

SSD Cache Controller Failures07/14

How Do SSD Cache Controller Failures Affect VM Datastores?

SSD cache controller failures take entire VM datastores offline when write-back cache data not yet flushed to capacity drives becomes inaccessible. Under sustained VM random I/O, the Flash Translation Layer on consumer-grade cache SSDs can overflow and lock the controller. PC-3000 SSD recovers the FTL via diagnostic mode, rebuilding translation tables from surviving NAND page metadata.

vSAN and enterprise SAN environments use consumer-grade SSDs as read/write cache tiers. Under sustained VM random I/O, the Flash Translation Layer can overflow and corrupt, locking the controller. Write-back cache data not yet flushed to capacity drives becomes inaccessible until PC-3000 SSD interfaces with the locked controller via diagnostic mode to rebuild the FTL from surviving NAND page metadata.

Phison S11 (PS3111) / SATAFIRM S11 Bug
The Phison S11 (PS3111) SATA controller (found in many budget NAS SSDs & cache drives) suffers an FTL overflow under heavy random writes. The drive drops offline, reports "SATAFIRM S11" as its model string, & shows 0 bytes capacity. We use PC-3000 SSD to inject a firmware loader into the controller's SRAM via Techno Mode, rebuilding the FTL from surviving NAND page metadata.
Silicon Motion SM2259XT / 0 Bytes Capacity
SM2259XT-based drives used in NAS & VM caching can suffer FTL table corruption that causes the drive to enumerate with 0 bytes or enter ROM mode. PC-3000 Portable III interfaces with the locked controller via diagnostic mode to extract the mapping tables & rebuild the logical volume.

What Happens When a Hyper-V Checkpoint Merge Fails?

When a Hyper-V checkpoint merge is interrupted by host shutdown, storage disconnection, or hypervisor crash, the AVHDX differencing disks are left orphaned and the parent VHDX is left in an indeterminate state. The VM configuration file (VMCX) still references the checkpoint chain, so simply re-attaching the parent VHDX produces inconsistent guest data. Recovery requires parsing each AVHDX BAT, ordering the deltas by creation timestamp, and consolidating the writes into a single base VHDX offline.

Hyper-V supports two virtual disk container formats. The legacy VHD format from Windows Server 2008 stores its 1024-byte footer at the end of the file; corruption of those final sectors leaves Windows unable to mount the image even when most of the data is intact. VHDX, introduced with Windows Server 2012, stores two redundant 64 KB header copies at the 64 KB and 128 KB file offsets and can survive single-header corruption.

Legacy VHD: 2 TB ceiling and footer-only metadata
VHD files use a single footer structure that contains the disk geometry, type (fixed, dynamic, differencing), and parent locator. If the final 1024 bytes of the file are zeroed by a truncation event or a partial copy, the image is unmountable. We extract the footer from a backup or rebuild it from the file size and known guest geometry before mounting. VHD also caps at 2040 GB; older Hyper-V environments that hit this limit may show silent guest filesystem corruption.
Interrupted checkpoint merge
Initiating a checkpoint deletion in Hyper-V Manager triggers a background merge that copies AVHDX delta blocks back into the parent VHDX. If the host loses power or the storage subsystem disconnects mid-merge, the AVHDX file remains on disk, the parent VHDX has been partially updated, and the VMCX configuration file may still list the checkpoint as active. Restarting Hyper-V in this state can corrupt the parent further. The safe procedure is to image both the parent and every orphaned AVHDX file write-blocked, then reconstruct the merge offline by replaying AVHDX BAT entries in chronological order.
Cluster Shared Volume (CSV) failures
CSVs allow multiple Hyper-V hosts to access the same NTFS or ReFS volume holding VHDX files. A coordinator-node failure during checkpoint operations can leave VHDX files in inconsistent states across the cluster. We image the underlying SAN LUN as a single block device, parse the CSV NTFS or ReFS structures offline, and extract each VHDX independently of cluster state.

For mixed-environment recoveries that include both Hyper-V and VMware hosts on shared server storage infrastructure, both hypervisor metadata layers are reconstructed from the same underlying drive images.

TRIM/UNMAP Warning08/14

Can Deleted VMs Be Recovered from SSD Datastores?

Deleted VM recovery from SSD-backed datastores is generally impossible once TRIM or UNMAP commands have executed, because those commands permanently erase the underlying NAND flash pages. VMFS6 enables UNMAP by default; Hyper-V passes TRIM through to physical SSDs. Power down the storage immediately after accidental deletion to prevent garbage collection from running.

Recovery of deleted virtual machines from SSD-backed datastores is generally impossible if TRIM or UNMAP commands have already executed, because those commands instruct the storage controller to permanently erase the underlying NAND flash pages. VMFS6 enables UNMAP by default; Hyper-V passes TRIM commands through to physical SSDs.

VMFS6 UNMAP
Enabled by default. ESXi periodically issues SCSI UNMAP commands to the SAN for deleted blocks. Once the SSD controller processes the UNMAP, garbage collection erases the NAND pages.
Hyper-V / Windows Server
Windows Server 2016+ passes TRIM commands from the guest OS through ReFS/NTFS directly to physical SSDs. If the underlying storage is an all-flash array, TRIM propagates to every tier.
Proxmox / ZFS Autotrim
ZFS autotrim periodically discards freed blocks on SSD-backed pools. Disabling autotrim before deletion preserves the data until manual intervention.

If you suspect a VM was accidentally deleted from an SSD-backed datastore, power down the storage immediately. Every second the storage remains online gives the controller more time to execute pending TRIM operations & run garbage collection.

Physical vs Logical Failure Domain Table09/14

What Are the Physical vs. Logical Failure Domains in VM Recovery?

VM recovery splits into two failure domains: physical and logical. Physical failures such as drive head crashes, PCB failures, and SSD controller lockups require write-blocked imaging and head swaps before any logical work can begin. Logical failures such as VMFS corruption, VMDK descriptor loss, and broken snapshot chains require metadata reconstruction and snapshot chain consolidation on clean drive images.

VM recovery splits into two distinct failure domains that determine the recovery approach and cost. Physical failures include drive head crashes, PCB failures, and SSD controller lockups; these require write-blocked imaging and head swaps. Logical failures include VMFS corruption, VMDK descriptor loss, and broken snapshot chains; these require metadata reconstruction and snapshot chain consolidation.

Unlike standard hard drive recovery, enterprise SAS drives add a hardware-level complication: non-standard sector sizes that require specialized transcoding before any logical parsing can begin.

DomainFailure ExamplesRecovery ApproachTools
Physical (Hardware)Drive head crash, motor seizure, PCB failure, SAS/SATA interface fault, SSD controller failureWrite-blocked imaging through PC-3000, head swaps in clean bench, firmware repairPC-3000, DeepSpar, 0.02µm ULPA clean bench
Logical (Software)VMFS corruption, VMDK descriptor loss, VHDX BAT damage, qcow2 L1/L2 table corruption, broken snapshot chainHost filesystem parsing, virtual disk metadata reconstruction, snapshot chain consolidationPC-3000 RAID Edition, hex analysis, custom parsing tools

SAS Infrastructure and Non-Standard Sector Sizes

Enterprise RAID controllers (HP SmartArray, Dell PERC, LSI MegaRAID) often format SAS drives with 520-byte or 528-byte sectors instead of the standard 512 bytes. The extra 8 or 16 bytes per sector contain Data Integrity Field (DIF) parity data used by the controller for end-to-end error checking.

Standard SATA imaging tools can't read 520-byte SAS sectors. We use SAS Host Bus Adapters & PC-3000 SAS to image these drives natively, then transcode the 520-byte sectors back to 512-byte blocks by stripping the DIF data. This sector transcoding is mandatory before offline RAID reconstruction & VMFS parsing can proceed. The transcoding preserves the logical data while removing the controller-specific parity layer.

RAID Warning10/14

Should You Rebuild a RAID Array Before VM Recovery?

No. Rebuilding a degraded RAID array with mechanically failing drives forces a full-surface read of every surviving member to recalculate parity. A second drive developing read errors during that rebuild drops the array entirely and destroys all VM data. Power down the server, label drives by bay position, and ship bare drives. Original drives are never written to.

No. Rebuilding a degraded RAID array containing mechanically failing drives is a destructive process that frequently causes complete data loss. The controller forces a full-surface read of every surviving member to recalculate parity. If a second drive develops read errors during the rebuild, the array drops entirely.

  • Power down the server immediately
  • Label each drive with its bay position (bay 0, bay 1, etc.)
  • Ship the bare drives to our Austin, TX lab. We don't need the server chassis or controller card.
  • We image each drive individually through PC-3000, replacing heads on failed members as needed
  • The array is reconstructed virtually from the images. Original drives are never written to.

Per-drive pricing for hard drive recovery starts at From $100. Head swaps on failed RAID members cost $1,200–$1,500 per drive plus donor. +$100 rush fee to move to the front of the queue.

SMR Warning for Virtualization11/14

Why Do SMR Drives Fail Under VM Workloads?

Shingled Magnetic Recording (SMR) drives fail under VM workloads because random writes require read-modify-write cycles across entire write bands, causing severe performance degradation. VM workloads produce high random I/O. RAID or ZFS pools built on SMR drives extend rebuild times from hours to days, increasing the probability of cascading failures on surviving members.

Shingled Magnetic Recording (SMR) drives overlap write tracks to increase density. Random writes require read-modify-write cycles across entire bands, which causes severe performance degradation under sustained random I/O. VM workloads produce high random I/O. If SMR drives were used in a RAID or ZFS pool hosting VMs, rebuild times extend from hours to days.

The extended stress on surviving members during a rebuild increases the probability of cascading failures. Recovery of arrays built on SMR drives requires imaging each member with extended timeout configurations to handle the slow random-read performance inherent to SMR architectures.

Proxmox VE Storage Backends: ZFS, LVM-thin, and Ceph Recovery

Proxmox stores VM disks across three primary storage backends, each with distinct failure modes. ZFS pool recovery hinges on uberblock and metaslab integrity and on volblocksize alignment with guest filesystem block size. LVM-thin recovery requires the metadata logical volume to be intact, since it tracks the chunk-allocation tree for every thin volume in the pool. Ceph RBD recovery extracts virtual disks object by object from individual OSD drives without requiring a surviving monitor quorum, by parsing the OSD directory structure offline.

Proxmox VE abstracts virtual disk storage behind a backend selector, but each backend produces fundamentally different on-disk structures and fundamentally different recovery workflows when storage fails.

ZFS (rpool, dpool, zvol-backed VMs)
Proxmox uses ZFS zvols (block devices carved from a ZFS pool) to back VM disks. Pool recovery requires intact uberblocks from at least one vdev label and intact metaslab allocation maps. A volblocksize set smaller than the guest filesystem cluster size produces severe write amplification under VM I/O and can exhaust the pool when transaction groups stall. Recovery imports the pool read-only into a separate analysis host and exports each zvol as a raw block device for guest-filesystem extraction. If the pool is degraded beyond ZFS tolerance, we parse the surviving vdev labels and reconstruct space maps offline.
LVM-thin pools
LVM-thin stores all chunk-allocation metadata in a small metadata logical volume separate from the data LV. If the metadata LV is corrupted or its checksum fails, every thin volume in the pool becomes inaccessible even when the data LV is fully intact. Recovery dumps the metadata LV via thin_dump, repairs the chunk tree using thin_repair against an offline copy, and rebuilds the thin volume mapping. Power loss during a thin volume snapshot create or merge is the most common failure trigger.
Ceph RBD (hyperconverged Proxmox clusters)
Ceph splits each RBD image into 4 MB objects distributed across OSDs (Object Storage Daemons) according to the CRUSH map. When the cluster loses monitor quorum or too many OSDs go offline simultaneously, Ceph refuses to serve I/O. We pull individual OSD drives, image them write-blocked, and parse each OSD's on-disk object directory (BlueStore key-value index) to extract every object belonging to the target RBD image. Object reassembly into a flat block device proceeds offline using the RBD header object to determine stripe size and object naming.

Backing file chains used by Proxmox linked clones add another failure dimension across all three backends: if the base image and overlay reside on different storage backends, a failure on either backend breaks the chain and both halves must be recovered independently.

How Do NFS and iSCSI Datastore Failures Affect VMs on NAS?

NAS appliances backing VM datastores via NFS or iSCSI fail in ways that present to the hypervisor as data loss but are actually transport-layer or volume-manager problems. NFS datastore stale handles after a NAS reboot leave VMs inaccessible until the export is remounted; iSCSI LUN persistent reservations held by a crashed initiator block all other hosts; consumer-grade NAS volume managers (Synology SHR/Btrfs, QNAP QTS) corrupt under sustained VM random I/O in ways that the NAS GUI cannot repair.

Hosting production virtual machines on consumer or prosumer NAS appliances introduces failure modes that do not exist on dedicated SAN storage. The volume manager, network transport, and underlying RAID layer all contribute distinct failure paths.

NFS stale handle after NAS reboot or volume migration
ESXi caches NFS file handles for VMDK files. If the NAS reboots or the underlying volume is recreated, those handles become stale and the hypervisor reports the datastore as inaccessible even though the data on the NAS is intact. Forcing a VM power-on against a stale handle can corrupt the VMDK descriptor. Recovery requires unmounting the datastore cleanly, re-imaging the underlying NAS volume, and re-importing the VMs against fresh handles.
iSCSI persistent reservation lockout
iSCSI LUNs use SCSI-3 persistent reservations to coordinate multi-host access. A crashed initiator can leave a reservation held against a LUN, blocking all other hosts from acquiring write access. The LUN appears online to the NAS but unmountable to every hypervisor. Clearing stranded reservations through the NAS console without first quiescing all hosts can corrupt VMFS. We image the underlying NAS volume offline and extract VMDK files without any host needing to negotiate reservations.
Consumer NAS volume manager corruption (Synology SHR, QNAP QTS, Btrfs metadata)
Synology SHR layers Btrfs over LVM over mdadm, while QNAP QTS layers ext4 over LVM over mdadm. Sustained VM random writes can corrupt Btrfs metadata trees or LVM thin pool maps in ways the NAS recovery utility cannot repair. We pull the bare drives, image each member, reconstruct the mdadm RAID, parse the LVM layer offline, and extract the VM disk container files from the recovered Btrfs or ext4 filesystem. See our NAS data recovery page for vendor-specific volume layouts.
Jumbo frame MTU mismatches that look like data loss
NFS or iSCSI traffic configured for 9000-byte jumbo frames across a switch path that fragments at 1500 bytes produces silent partial-write failures under VM workloads. The guest sees corruption that walks across files, but the underlying NAS storage is intact. Before recovery work begins on a NAS-backed VM datastore, the network MTU consistency must be verified end to end.
Pricing12/14

How Much Does Virtual Machine Data Recovery Cost?

Virtual machine recovery is priced per drive, not per VM. Each drive in the datastore follows one of five published tiers based on its physical condition: $100 for simple copies through $2,000 for platter damage. A three-drive RAID 5 with one failed mechanical member is billed as two tier-1 images plus one tier-4 head swap. No data recovered means no charge.

Virtual machine recovery pricing is based on each drive's physical condition. Per-drive pricing follows the same five published tiers used for all drive recoveries, from From $100 to $2,000. Multi-drive arrays involve additional reconstruction work to detect RAID parameters, extract virtual disks, & consolidate snapshots. No data recovered means no charge. +$100 rush fee to move to the front of the queue. Donor drives are matching drives used for parts. Typical donor cost: $50–$150 for common drives, $200–$400 for rare or high-capacity models. We source the cheapest compatible donor available.

  1. Low complexity

    Simple Copy

    Your drive works, you just need the data moved off it

    Functional drive; data transfer to new media

    Rush available: +$100

    $100

    3-5 business days

  2. Low complexity

    File System Recovery

    Your drive isn't recognized by your computer, but it's not making unusual sounds

    File system corruption. Accessible with professional recovery software but not by the OS

    Starting price; final depends on complexity

    From $250

    2-4 weeks

  3. Medium complexity

    Firmware Repair

    Your drive is completely inaccessible. It may be detected but shows the wrong size or won't respond

    Firmware corruption: ROM, modules, or translator tables corrupted; requires PC-3000 terminal access

    CMR drive: $600. SMR drive: $900.

    $600–$900

    3-6 weeks

  4. High complexity

    Most Common

    Head Swap

    Your drive is clicking, beeping, or won't spin. The internal read/write heads have failed

    Head stack assembly failure. Transplanting heads from a matching donor drive on a clean bench

    50% deposit required. CMR: $1,200-$1,500 + donor. SMR: $1,500 + donor.

    50% deposit required

    $1,200–$1,500

    4-8 weeks

  5. High complexity

    Surface / Platter Damage

    Your drive was dropped, has visible damage, or a head crash scraped the platters

    Platter scoring or contamination. Requires platter cleaning and head swap

    50% deposit required. Donor parts are consumed in the repair. Most difficult recovery type.

    50% deposit required

    $2,000

    4-8 weeks

Hardware Repair vs. Software Locks

Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.

No data, no fee. Free evaluation and firm quote before any paid work. Full guarantee details. Head swap and surface damage require a 50% deposit because donor parts are consumed in the attempt.

Rush fee
+$100 rush fee to move to the front of the queue
Donor drives
Donor drives are matching drives used for parts. Typical donor cost: $50–$150 for common drives, $200–$400 for rare or high-capacity models. We source the cheapest compatible donor available.
Target drive
The destination drive we copy recovered data onto. You can supply your own or we provide one at cost plus a small markup. For larger capacities (8TB, 10TB, 16TB and above), target drives cost $400+ extra. All prices are plus applicable tax.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to make sure your hard drive is handled safely and properly. This approach allows us to serve clients nationwide with consistent technical standards.

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video
Reviews13/14
Faq14/14

Virtual Machine Recovery FAQ

Which virtual disk formats can you recover?
We recover VMDK (VMware ESXi and Workstation), VHD and VHDX (Microsoft Hyper-V), QCOW2 (KVM, Proxmox VE, OpenStack), VDI (VirtualBox), and raw disk images. The virtual disk format determines the metadata structures we parse, but the underlying physical recovery process is the same: image the failed storage, reconstruct the array or volume, then extract the VM disk files from the host filesystem.
What causes a VMDK CID mismatch and can you fix it?
A Content ID (CID) mismatch occurs when a VMware snapshot chain breaks. This typically happens if the ESXi host crashes or loses storage connectivity during a snapshot commit, causing the parentCID of the delta disk to lose synchronization with the base flat file. We read the grain directory and grain tables from each delta, verify the actual data lineage, and reconstruct the descriptor file with correct CID/parentCID references.
Can you recover a dynamically expanding VHDX that became corrupted?
Yes. Dynamic VHDX files store data in blocks mapped by a Block Allocation Table (BAT). If the VHDX headers, BAT, or log entries are corrupted (common during power loss or storage disconnection), we parse the VHDX payload blocks from the raw disk image and reconstruct the BAT by scanning block signatures. If both redundant headers are destroyed, we calculate the virtual disk geometry from the payload block layout.
How do you handle qcow2 corruption on Proxmox VE?
Proxmox stores KVM virtual machine disks as qcow2 files on ZFS, LVM-thin, or Ceph storage backends. Power loss during write operations with cache=none can produce torn writes that corrupt the qcow2 L1/L2 reference count tables. We reconstruct the qcow2 metadata by scanning the file for cluster boundaries and rebuilding the mapping tables from the data clusters themselves.
Do I need to send the entire server or just the drives?
Send the drives. We do not need the server chassis, controller card, or cabling. For RAID arrays, label each drive with its slot position (bay 0, bay 1, etc.) before removing them. We extract RAID metadata (DDF, PERC, Smart Array, mdadm superblocks, ZFS labels) from the drives themselves and reconstruct the array offline using PC-3000 RAID Edition.
Can deleted VMs be recovered from SSD-backed datastores?
It depends on whether TRIM/UNMAP was active. VMFS6 enables automatic UNMAP by default, and modern Hyper-V environments pass TRIM commands through to underlying storage. If the SAN or local SSD controller has already executed TRIM on the blocks that held the deleted virtual disk, the controller marks those blocks as no longer needed and garbage collection erases the NAND pages. Recovery is not possible. If TRIM was disabled or has not yet executed, recovery may still be feasible. Power down the storage immediately to prevent garbage collection.
How much does virtual machine data recovery cost?
Pricing depends on the physical condition of the drives hosting the datastore. Per-drive pricing starts at $100 for simple copies, From $250 for file system recovery, $600–$900 for firmware repair, $1,200–$1,500 for head swaps, and $2,000 for platter damage. These are the same five published tiers we use for all drive recoveries. Multi-drive RAID arrays involve additional reconstruction work. No data recovered means no charge. +$100 rush fee to move to the front of the queue.
How does the Phison SATAFIRM S11 bug affect virtual machine cache drives?
Many NAS appliances & SAN read-cache tiers use SSDs with Phison S11 controllers. Under sustained VM random write loads, the Flash Translation Layer overflows & corrupts, causing the drive to report 'SATAFIRM S11' with 0 bytes capacity. We use PC-3000 SSD to interface with the controller via diagnostic mode, load custom microcode into the controller's SRAM, & rebuild the translation tables to extract pending cache writes.
Can you recover VMDKs from an HP SmartArray with 520-byte SAS drives?
Yes. HP SmartArray & Dell PERC enterprise controllers often format SAS drives with 520-byte sectors to include Data Integrity Field (DIF) parity data. Standard recovery software can't read non-standard sector sizes. We image the drives using PC-3000 SAS hardware, transcode the 520-byte sectors back to 512-byte blocks, & reconstruct the controller metadata offline to extract the VMFS datastore.
How long does virtual machine recovery take for a production workload?
Turnaround depends on drive count, physical condition, and array complexity. A single-drive qcow2 extraction with no mechanical damage typically completes in 2 to 4 business days. A multi-drive RAID 5 or RAID 6 array with one or more failed members requiring head swaps runs 5 to 10 business days, driven by donor drive sourcing and sector-by-sector imaging time. vSAN and multi-node cluster reconstructions extend further. +$100 rush fee to move to the front of the queue. A rush case moves to the front of the queue ahead of standard cases.
Do you sign NDAs and provide chain-of-custody documentation?
Yes. We sign mutual NDAs before any drives arrive and provide chain-of-custody logs tracking every transfer between shipping, imaging, reconstruction, and return shipment. All work is performed in-house at our Austin, TX lab; drives are never sublet or shipped to partner facilities. We do not hold SOC 2 or ISO 27001 certifications. If your compliance framework requires a specific certified data recovery vendor, confirm with your auditor before sending drives. For standard corporate confidentiality needs, our NDA and single-facility policy are sufficient.
Can you recover VMware vSAN when a disk group fails?
Yes. vSAN disk groups pair one cache-tier SSD with one or more capacity-tier drives. When the cache SSD fails, every capacity drive in that group goes offline because vSAN cannot flush pending writes. We image the failed cache SSD (often a consumer Phison or Silicon Motion controller that FTL-locked under sustained random writes) via PC-3000 SSD to extract the write buffer, then image each capacity drive independently. vSAN metadata (the VSAN Object Manager structures) is parsed offline to reconstruct the objects and component layouts across the cluster. Individual VMDK objects are then extracted from the reconstructed namespace.
Can you recover a single array hosting Windows, Linux, and BSD VMs together?
Yes. Once the physical array is imaged and the host filesystem (VMFS, NTFS, ZFS, or ext4) is parsed, each virtual disk container file is extracted independently. The guest operating system inside the VMDK, VHDX, or qcow2 file has no bearing on our ability to extract the container. After extraction, we verify guest filesystem integrity by mounting the recovered virtual disk read-only against NTFS, ReFS, ext4, XFS, Btrfs, ZFS, or UFS drivers. Mixed-guest arrays do not carry a pricing premium; the per-drive tier structure is unchanged.
Our Fault Tolerance VMs require eagerzeroedthick. Does that change recovery?
It simplifies it. Eagerzeroedthick VMDKs pre-allocate and zero every block at creation, so the flat extent is contiguous on the VMFS volume and the entire virtual disk geometry can be reconstructed from the flat file size alone if the descriptor is destroyed. There is no sparse allocation map to recover. FT, MSCS, and certain vSAN policies all mandate eagerzeroedthick for the same reason: synchronous mirroring requires deterministic block-level writes. Recovery time on eagerzeroedthick disks is typically shorter than thin provisioned equivalents because no VMFS resource fork parsing is needed.
Our NFS datastore went inaccessible after the NAS rebooted. The data is still on the NAS. What now?
Do not force a VM power-on against the inaccessible datastore. ESXi caches NFS file handles and a forced operation against a stale handle can corrupt the VMDK descriptor and snapshot chain. Unmount the datastore cleanly from every host, verify the NFS export and underlying volume are healthy on the NAS side, and remount. If the NAS volume itself was recreated or its UUID changed, the file handles will not refresh and the datastore must be re-registered. If a forced power-on already occurred and the VMDK is now showing CID mismatches or descriptor errors, image the underlying NAS volume offline and recover the VMs through descriptor reconstruction.
Can you recover a Hyper-V VM whose Live Migration was interrupted halfway?
Yes. An interrupted Live Migration leaves the VHDX in an indeterminate state: part of the memory and disk delta has transferred to the destination host but the VM is no longer cleanly owned by either side. The source VHDX may be locked by the source Hyper-V service, and the destination may have a partial copy. We image both copies of the VHDX write-blocked, compare the two against the VMCX configuration timestamps, identify the most recent consistent state, and reconstruct the guest filesystem from whichever copy preserves filesystem journal consistency. Cluster Shared Volume environments add a coordinator-node dimension that must be reconstructed from the cluster log.
Our Ceph cluster lost monitor quorum. Can we recover RBD images from the OSD drives directly?
Yes. Ceph RBD images are sharded into 4 MB objects distributed across OSDs according to the CRUSH map. When monitor quorum is lost, the cluster refuses I/O, but the underlying object data remains on each OSD drive. We pull the OSD drives, image each one write-blocked, parse the BlueStore key-value index on each drive to enumerate every object, and reassemble RBD images by reading the RBD header object to determine stripe size, object name prefix, and total image size. The CRUSH map is reconstructed from the OSD metadata if no surviving copy exists. Recovery does not require a working monitor or working MDS; the data itself is self-describing once the OSD layout is parsed.

Ready to recover your virtual machines?

Free evaluation. No data = no charge. Ship your drives from anywhere in the U.S.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews

4.9★ · 1,837+ reviews