Skip to main contentSkip to navigation
Rossmann Repair Group logo - data recovery and MacBook repair

Enterprise Virtualization Recovery

VMware ESXi Data Recovery

We recover VMFS datastores from failed RAID arrays, repair broken snapshot chains, and extract individual .vmdk virtual disks from corrupted ESXi hosts and vSAN clusters. Free evaluation. No data = no charge.

Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated February 2026
14 min read

How VMware ESXi Datastores Fail and How We Recover Them

VMware ESXi stores virtual machines on VMFS (Virtual Machine File System) datastores backed by RAID arrays. When the underlying array degrades, the ESXi host loses access to the VMFS volume and all VMs on it go offline. Recovery requires imaging the RAID member drives, reconstructing the array offline, and parsing VMFS metadata to extract each .vmdk virtual disk file.

VMFS is a clustered filesystem designed for shared storage access across multiple ESXi hosts. It uses on-disk locking mechanisms (heartbeat regions and ATS primitives on VMFS6) to coordinate concurrent access. When a RAID failure corrupts the volume header or allocation bitmap, the lock state becomes inconsistent and ESXi refuses to mount the datastore. Standard VMware tools (vmkfstools, vscsiStats) cannot repair a datastore with underlying media errors. The data must be recovered at the physical layer first.

VMFS Metadata Architecture and Failure Points

Understanding VMFS on-disk layout is essential for targeted recovery. Both VMFS5 and VMFS6 share a common structural pattern, but differ in block allocation, UNMAP behavior, and snapshot formats.

VMFS5 On-Disk Layout

  • Volume header at LBA 0 contains the VMFS superblock, including UUID, version, and volume label
  • Heartbeat region at offset 0x100000 (1MB); each ESXi host writes its UUID here to claim lock ownership
  • Resource bitmap tracks 1MB block allocation across the volume; corruption here causes "no space" errors on a half-empty datastore
  • File descriptor heap stores inode-like entries for .vmdk files, including pointer block addresses for data extents
  • Sub-block allocation (8KB granularity) handles small files like .vmx config files and descriptor VMDKs

VMFS6 Changes

  • Automatic UNMAP (space reclamation) runs in the background, which can zero-fill previously allocated blocks on thin-provisioned LUNs
  • SE Sparse (Space Efficient Sparse) snapshot format replaces vmfsSparse by default; uses grain directories and grain tables with a default 4KB grain size for block-level change tracking
  • Native 512e and 4Kn drive support; VMFS6 aligns I/O to physical sector boundaries, affecting how data is laid out on AF drives
  • GPT-based partition layout on the backing LUN (VMFS5 used MBR)
  • ATS (Atomic Test and Set) VAAI primitives replace some SCSI reservation locks; ATS misfire during power loss can leave orphaned locks

When a RAID member fails mid-write, the VMFS journal may contain an incomplete transaction. ESXi attempts to replay this journal on mount. If the journal references sectors that are now unreadable (because the RAID array is degraded), the mount fails entirely. Our approach bypasses the ESXi mount process: we parse VMFS structures directly from the raw RAID image and extract .vmdk files by following pointer block chains, regardless of journal state.

ESXi Snapshot Chain Reconstruction

Snapshot chains in ESXi consist of a base .vmdk and one or more delta files (-delta.vmdk using vmfsSparse on VMFS5, -sesparse.vmdk by default on VMFS6). Each delta records changed blocks relative to its parent. When the chain breaks, the VM cannot power on and standard consolidation fails.

How Snapshot Chains Break

  1. CID mismatch: Each VMDK descriptor contains a Content ID (CID) and a Parent Content ID (parentCID). When a snapshot is created, the new delta's parentCID must match the parent's CID. ESXi crashes or storage disconnects during snapshot creation can leave these values out of sync.
  2. Orphaned deltas: Failed "Delete All Snapshots" operations can leave delta files on disk with no corresponding entry in the VM's .vmsd snapshot descriptor file. The snapshot manager no longer tracks these deltas, but the VM still references them in its disk chain.
  3. Corrupted grain tables: SE sparse deltas on VMFS6 use grain directories and grain tables to map changed sectors. A power loss during a grain table update can corrupt the mapping, causing reads to return incorrect data or I/O errors.

We reconstruct broken chains by reading the grain directory from each delta, determining the correct parent-child ordering from creation timestamps and CID values, and manually consolidating the changed blocks back into the base extent. The result is a single flat .vmdk representing the VM's most recent consistent state.

vSAN Distributed Datastore Recovery

VMware vSAN aggregates local SSDs and HDDs from multiple ESXi hosts into a single distributed datastore. VM storage objects are split into components and distributed across hosts according to a storage policy. FTT=1 defaults to mirroring but can use RAID-5 erasure coding; FTT=2 defaults to triple mirroring but can use RAID-6 erasure coding, depending on the failure tolerance method (FTM) setting. Multi-node failures or CMMDS metadata corruption can take the entire vSAN datastore offline.

  • CMMDS reconstruction: The Cluster Monitoring, Membership, and Directory Service maintains a distributed database of all object locations across the cluster. When CMMDS becomes inconsistent (typically after simultaneous host failures), we rebuild the object map by scanning each host's capacity disks for object headers and component metadata.
  • DOM object reassembly: The Distributed Object Manager splits each .vmdk into components (up to 255GB per component on most vSAN versions). Each component is a RAID-1 mirror or RAID-5/6 stripe across disk groups. We locate each component on the physical disks, reconstruct the stripe or mirror, and reassemble the full .vmdk from its component pieces.
  • Disk group structure: Each vSAN disk group contains one SSD cache tier and up to seven HDD/SSD capacity devices. The SSD cache provides a write buffer (and read cache in hybrid configurations); deduplication metadata, when enabled, resides on the capacity tier. We image the capacity devices (where persistent data resides) and use the cache device to resolve any in-flight writes.
  • Witness and stretched clusters: Two-node vSAN configurations use a witness host for quorum. If the witness becomes unavailable simultaneously with a data node, the remaining node cannot confirm object ownership. We bypass the quorum requirement by working directly with the physical disk images.

Common ESXi Failure Scenarios We Handle

RAID Array Degradation

PERC or Smart Array controller detects multiple failed members. ESXi host loses access to the VMFS LUN. All VMs on the datastore go offline simultaneously.

VMFS Metadata Corruption

Power loss during metadata commit corrupts the resource bitmap or file descriptor heap. ESXi refuses to mount the datastore with "cannot open the disk" or "no such file" errors.

Failed Snapshot Consolidation

"Delete All Snapshots" task fails, leaving orphaned delta files. The VM runs on an increasingly fragmented chain until the datastore fills or performance degrades to zero.

ESXi Boot Failure

ESXi host fails to boot after firmware update or boot media corruption. VMs are intact on the VMFS datastore but inaccessible without a running hypervisor.

vSAN Multi-Node Failure

Power event takes down multiple vSAN hosts simultaneously. Object components become stale across the cluster and vSAN cannot rebuild without manual intervention.

Accidental VM Deletion

VM removed from inventory or .vmdk files deleted from the datastore browser. VMFS does not immediately zero-fill freed blocks, so recovery is possible if no new writes have overwritten the extents.

Recovery Methodology for IT Administrators

This section details the low-level procedures we use. If you are evaluating our technical capability, this is how the work gets done.

1. RAID Member Imaging with Sector-Level Granularity

Each member drive is imaged through PC-3000 using SAS HBAs for SAS drives or NVMe adapters for PCIe SSDs. The imaging process captures every addressable LBA, including those beyond the standard ATA/SCSI command set boundary (service area, G-list entries). For drives with bad sectors, we configure PC-3000 head maps to skip damaged heads on initial passes and return to them with aggressive retry parameters after capturing all healthy sectors. DeepSpar Disk Imager provides hardware-level timeout control for drives that lock up during reads.

2. Controller Metadata Extraction

PERC controllers store their DDF (Disk Data Format) metadata in the last sectors of each member drive. This metadata block contains the virtual disk configuration: RAID level, stripe size, member ordering, rebuild checkpoint, and consistency state. Smart Array controllers use a similar reserved area but with an HP-proprietary format. PC-3000 RAID Edition reads these metadata blocks and uses them to reconstruct the virtual disk layout without needing the original controller hardware. For arrays where the metadata has been overwritten or zeroed (firmware flash gone wrong), we fall back to brute-force parameter detection: testing stripe size permutations (64KB, 128KB, 256KB, 512KB, 1MB) and member orderings against known filesystem signatures.

3. VMFS Parsing and VMDK Extraction

With the RAID image reconstructed, we parse the VMFS volume directly from the raw image. The process reads the superblock at LBA 0 to determine VMFS version, block size (always 1MB on VMFS5+), and total volume capacity. The file descriptor heap is scanned for entries matching .vmdk, .vmx, .nvram, and .vmsd file types. For each .vmdk, we read the descriptor file to determine whether it is a monolithic flat disk, a split sparse, or a snapshot delta. Flat extent data is located by following the pointer block chain from the file descriptor. The extracted .vmdk is verified by mounting it read-only and checking guest filesystem integrity (NTFS, ext4, XFS) with standard filesystem tools.

4. Hyper-V Coexistence

Environments migrated from Hyper-V to ESXi (or running both) may contain .vhdx files stored on VMFS datastores. We extract .vhdx files using the same VMFS parsing pipeline and process them separately. VHDX uses a 4KB log structure for crash consistency, and recovery follows the same image-first, parse-from-raw methodology. For broader server recovery needs including Hyper-V standalone environments, see our main server recovery page.

VMware Recovery Pricing

VMware datastore recovery follows the same transparent pricing model as every other service: per-drive imaging based on each drive's condition, plus a $400-$800 array reconstruction fee that includes VMFS parsing and VMDK extraction. No data recovered means no charge.

Service TierPrice Range (Per Drive)Description
Logical / Firmware Imaging$250-$900Firmware module damage, SMART threshold failures, or filesystem corruption on individual array members.
Mechanical (Head Swap / Motor)$1,200-$1,50050% depositDonor parts consumed during transplant. SAS drives require SAS-specific donors matched by model, firmware revision, and head count.
Array Reconstruction + VMFS$400-$800per arrayRAID reconstruction, VMFS parsing, and .vmdk extraction. Includes snapshot chain consolidation if applicable.

No Data = No Charge: If we recover nothing from your VMware environment, you owe $0. Free evaluation, no obligation.

Enterprise competitors charge $5,000-$15,000 with opaque "emergency" surcharges. We publish our pricing because the work is the same regardless of what label gets put on the invoice.

We sign NDAs for corporate data recovery. All drives remain in our Austin lab under chain-of-custody documentation throughout the process. We are not HIPAA certified and do not sign BAAs, but we are willing to discuss your specific compliance requirements before work begins.

VMware ESXi Recovery; Common Questions

What causes VMFS datastore corruption and can it be recovered?
VMFS corruption typically results from underlying RAID array degradation, sudden power loss during metadata commits, or ESXi host crashes during snapshot operations. The VMFS volume header, resource bitmap, or file descriptor heap can become inconsistent, leaving the datastore unmountable. Recovery involves imaging the RAID members, reconstructing the array, and parsing VMFS metadata structures to locate .vmdk file descriptors and their flat extents on disk.
Can you fix a broken ESXi snapshot chain?
Yes. ESXi snapshot chains consist of a base .vmdk descriptor, one or more -delta.vmdk (or -sesparse.vmdk on VMFS6) files, and a .vmsn memory state file. When a chain breaks, the CID/parentCID references in the descriptor files no longer match. We reconstruct the chain by reading the grain directory and grain tables from each delta, reordering them by creation timestamp, and consolidating the writes back into a single flat extent.
How do you recover data from a failed vSAN cluster?
vSAN distributes VM storage objects across local SSDs and HDDs in each ESXi host using a distributed object manager (DOM). When multiple nodes fail or the CMMDS (Cluster Monitoring, Membership, and Directory Service) metadata becomes corrupted, the datastore goes offline. We image the capacity drives from each affected node, reconstruct the DOM object layout, and extract the component pieces of each .vmdk across the cluster.
Does the ESXi version affect recovery?
Yes. VMFS5 (ESXi 5.x/6.x default) uses a unified 1MB block size with sub-block allocation for small files. VMFS6 (ESXi 6.5+ optional, 7.x+ default) introduced automatic UNMAP, 512e/4Kn drive support, and SE sparse snapshots. ESXi 8.0 introduced a local-only datastore mode. Each version stores metadata at different offsets and uses different allocation structures. Our tooling handles all current VMFS versions.
Can you recover thin-provisioned VMs that were deleted from the datastore?
If the VMFS metadata entries for the deleted .vmdk have not been overwritten, we can recover the file by parsing the allocation bitmap and locating the physical extents on disk. Thin-provisioned disks allocate blocks on demand, so recovery depends on how much of the freed space has been reused since deletion. Power down the host as soon as possible to prevent overwriting.
How much does VMware datastore recovery cost?
Same transparent model as all our services: per-drive imaging fee based on each drive's condition, plus a $400-$800 array reconstruction fee. The VMFS parsing and VMDK extraction are included in the reconstruction fee. No data recovered means no charge.

Ready to recover your VMware environment?

Free evaluation. No data = no charge. Mail-in from anywhere in the U.S.