Enterprise Virtualization Recovery
VMware ESXi Data Recovery
We recover VMFS datastores from failed RAID arrays, repair broken snapshot chains, and extract individual .vmdk virtual disks from corrupted ESXi hosts and vSAN clusters. Free evaluation. No data = no charge.

How VMware ESXi Datastores Fail and How We Recover Them
VMware ESXi stores virtual machines on VMFS (Virtual Machine File System) datastores backed by RAID arrays. When the underlying array degrades, the ESXi host loses access to the VMFS volume and all VMs on it go offline. Recovery requires imaging the RAID member drives, reconstructing the array offline, and parsing VMFS metadata to extract each .vmdk virtual disk file.
VMFS is a clustered filesystem designed for shared storage access across multiple ESXi hosts. It uses on-disk locking mechanisms (heartbeat regions and ATS primitives on VMFS6) to coordinate concurrent access. When a RAID failure corrupts the volume header or allocation bitmap, the lock state becomes inconsistent and ESXi refuses to mount the datastore. Standard VMware tools (vmkfstools, vscsiStats) cannot repair a datastore with underlying media errors. The data must be recovered at the physical layer first.
VMFS Metadata Architecture and Failure Points
Understanding VMFS on-disk layout is essential for targeted recovery. Both VMFS5 and VMFS6 share a common structural pattern, but differ in block allocation, UNMAP behavior, and snapshot formats.
VMFS5 On-Disk Layout
- Volume header at LBA 0 contains the VMFS superblock, including UUID, version, and volume label
- Heartbeat region at offset 0x100000 (1MB); each ESXi host writes its UUID here to claim lock ownership
- Resource bitmap tracks 1MB block allocation across the volume; corruption here causes "no space" errors on a half-empty datastore
- File descriptor heap stores inode-like entries for .vmdk files, including pointer block addresses for data extents
- Sub-block allocation (8KB granularity) handles small files like .vmx config files and descriptor VMDKs
VMFS6 Changes
- Automatic UNMAP (space reclamation) runs in the background, which can zero-fill previously allocated blocks on thin-provisioned LUNs
- SE Sparse (Space Efficient Sparse) snapshot format replaces vmfsSparse by default; uses grain directories and grain tables with a default 4KB grain size for block-level change tracking
- Native 512e and 4Kn drive support; VMFS6 aligns I/O to physical sector boundaries, affecting how data is laid out on AF drives
- GPT-based partition layout on the backing LUN (VMFS5 used MBR)
- ATS (Atomic Test and Set) VAAI primitives replace some SCSI reservation locks; ATS misfire during power loss can leave orphaned locks
When a RAID member fails mid-write, the VMFS journal may contain an incomplete transaction. ESXi attempts to replay this journal on mount. If the journal references sectors that are now unreadable (because the RAID array is degraded), the mount fails entirely. Our approach bypasses the ESXi mount process: we parse VMFS structures directly from the raw RAID image and extract .vmdk files by following pointer block chains, regardless of journal state.
ESXi Snapshot Chain Reconstruction
Snapshot chains in ESXi consist of a base .vmdk and one or more delta files (-delta.vmdk using vmfsSparse on VMFS5, -sesparse.vmdk by default on VMFS6). Each delta records changed blocks relative to its parent. When the chain breaks, the VM cannot power on and standard consolidation fails.
How Snapshot Chains Break
- CID mismatch: Each VMDK descriptor contains a Content ID (CID) and a Parent Content ID (parentCID). When a snapshot is created, the new delta's parentCID must match the parent's CID. ESXi crashes or storage disconnects during snapshot creation can leave these values out of sync.
- Orphaned deltas: Failed "Delete All Snapshots" operations can leave delta files on disk with no corresponding entry in the VM's .vmsd snapshot descriptor file. The snapshot manager no longer tracks these deltas, but the VM still references them in its disk chain.
- Corrupted grain tables: SE sparse deltas on VMFS6 use grain directories and grain tables to map changed sectors. A power loss during a grain table update can corrupt the mapping, causing reads to return incorrect data or I/O errors.
We reconstruct broken chains by reading the grain directory from each delta, determining the correct parent-child ordering from creation timestamps and CID values, and manually consolidating the changed blocks back into the base extent. The result is a single flat .vmdk representing the VM's most recent consistent state.
vSAN Distributed Datastore Recovery
VMware vSAN aggregates local SSDs and HDDs from multiple ESXi hosts into a single distributed datastore. VM storage objects are split into components and distributed across hosts according to a storage policy. FTT=1 defaults to mirroring but can use RAID-5 erasure coding; FTT=2 defaults to triple mirroring but can use RAID-6 erasure coding, depending on the failure tolerance method (FTM) setting. Multi-node failures or CMMDS metadata corruption can take the entire vSAN datastore offline.
- CMMDS reconstruction: The Cluster Monitoring, Membership, and Directory Service maintains a distributed database of all object locations across the cluster. When CMMDS becomes inconsistent (typically after simultaneous host failures), we rebuild the object map by scanning each host's capacity disks for object headers and component metadata.
- DOM object reassembly: The Distributed Object Manager splits each .vmdk into components (up to 255GB per component on most vSAN versions). Each component is a RAID-1 mirror or RAID-5/6 stripe across disk groups. We locate each component on the physical disks, reconstruct the stripe or mirror, and reassemble the full .vmdk from its component pieces.
- Disk group structure: Each vSAN disk group contains one SSD cache tier and up to seven HDD/SSD capacity devices. The SSD cache provides a write buffer (and read cache in hybrid configurations); deduplication metadata, when enabled, resides on the capacity tier. We image the capacity devices (where persistent data resides) and use the cache device to resolve any in-flight writes.
- Witness and stretched clusters: Two-node vSAN configurations use a witness host for quorum. If the witness becomes unavailable simultaneously with a data node, the remaining node cannot confirm object ownership. We bypass the quorum requirement by working directly with the physical disk images.
Common ESXi Failure Scenarios We Handle
RAID Array Degradation
PERC or Smart Array controller detects multiple failed members. ESXi host loses access to the VMFS LUN. All VMs on the datastore go offline simultaneously.
VMFS Metadata Corruption
Power loss during metadata commit corrupts the resource bitmap or file descriptor heap. ESXi refuses to mount the datastore with "cannot open the disk" or "no such file" errors.
Failed Snapshot Consolidation
"Delete All Snapshots" task fails, leaving orphaned delta files. The VM runs on an increasingly fragmented chain until the datastore fills or performance degrades to zero.
ESXi Boot Failure
ESXi host fails to boot after firmware update or boot media corruption. VMs are intact on the VMFS datastore but inaccessible without a running hypervisor.
vSAN Multi-Node Failure
Power event takes down multiple vSAN hosts simultaneously. Object components become stale across the cluster and vSAN cannot rebuild without manual intervention.
Accidental VM Deletion
VM removed from inventory or .vmdk files deleted from the datastore browser. VMFS does not immediately zero-fill freed blocks, so recovery is possible if no new writes have overwritten the extents.
Recovery Methodology for IT Administrators
This section details the low-level procedures we use. If you are evaluating our technical capability, this is how the work gets done.
1. RAID Member Imaging with Sector-Level Granularity
Each member drive is imaged through PC-3000 using SAS HBAs for SAS drives or NVMe adapters for PCIe SSDs. The imaging process captures every addressable LBA, including those beyond the standard ATA/SCSI command set boundary (service area, G-list entries). For drives with bad sectors, we configure PC-3000 head maps to skip damaged heads on initial passes and return to them with aggressive retry parameters after capturing all healthy sectors. DeepSpar Disk Imager provides hardware-level timeout control for drives that lock up during reads.
2. Controller Metadata Extraction
PERC controllers store their DDF (Disk Data Format) metadata in the last sectors of each member drive. This metadata block contains the virtual disk configuration: RAID level, stripe size, member ordering, rebuild checkpoint, and consistency state. Smart Array controllers use a similar reserved area but with an HP-proprietary format. PC-3000 RAID Edition reads these metadata blocks and uses them to reconstruct the virtual disk layout without needing the original controller hardware. For arrays where the metadata has been overwritten or zeroed (firmware flash gone wrong), we fall back to brute-force parameter detection: testing stripe size permutations (64KB, 128KB, 256KB, 512KB, 1MB) and member orderings against known filesystem signatures.
3. VMFS Parsing and VMDK Extraction
With the RAID image reconstructed, we parse the VMFS volume directly from the raw image. The process reads the superblock at LBA 0 to determine VMFS version, block size (always 1MB on VMFS5+), and total volume capacity. The file descriptor heap is scanned for entries matching .vmdk, .vmx, .nvram, and .vmsd file types. For each .vmdk, we read the descriptor file to determine whether it is a monolithic flat disk, a split sparse, or a snapshot delta. Flat extent data is located by following the pointer block chain from the file descriptor. The extracted .vmdk is verified by mounting it read-only and checking guest filesystem integrity (NTFS, ext4, XFS) with standard filesystem tools.
4. Hyper-V Coexistence
Environments migrated from Hyper-V to ESXi (or running both) may contain .vhdx files stored on VMFS datastores. We extract .vhdx files using the same VMFS parsing pipeline and process them separately. VHDX uses a 4KB log structure for crash consistency, and recovery follows the same image-first, parse-from-raw methodology. For broader server recovery needs including Hyper-V standalone environments, see our main server recovery page.
VMware Recovery Pricing
VMware datastore recovery follows the same transparent pricing model as every other service: per-drive imaging based on each drive's condition, plus a $400-$800 array reconstruction fee that includes VMFS parsing and VMDK extraction. No data recovered means no charge.
| Service Tier | Price Range (Per Drive) | Description |
|---|---|---|
| Logical / Firmware Imaging | $250-$900 | Firmware module damage, SMART threshold failures, or filesystem corruption on individual array members. |
| Mechanical (Head Swap / Motor) | $1,200-$1,50050% deposit | Donor parts consumed during transplant. SAS drives require SAS-specific donors matched by model, firmware revision, and head count. |
| Array Reconstruction + VMFS | $400-$800per array | RAID reconstruction, VMFS parsing, and .vmdk extraction. Includes snapshot chain consolidation if applicable. |
No Data = No Charge: If we recover nothing from your VMware environment, you owe $0. Free evaluation, no obligation.
Enterprise competitors charge $5,000-$15,000 with opaque "emergency" surcharges. We publish our pricing because the work is the same regardless of what label gets put on the invoice.
We sign NDAs for corporate data recovery. All drives remain in our Austin lab under chain-of-custody documentation throughout the process. We are not HIPAA certified and do not sign BAAs, but we are willing to discuss your specific compliance requirements before work begins.
VMware ESXi Recovery; Common Questions
What causes VMFS datastore corruption and can it be recovered?
Can you fix a broken ESXi snapshot chain?
How do you recover data from a failed vSAN cluster?
Does the ESXi version affect recovery?
Can you recover thin-provisioned VMs that were deleted from the datastore?
How much does VMware datastore recovery cost?
Need Recovery for Other Devices?
Dell, HP, IBM enterprise servers
Dell EMC, NetApp, HPE arrays
RAID 0, 1, 5, 6, 10 arrays
Synology, QNAP, Buffalo
VMDK, VHD/VHDX, QCOW2 extraction
NVMe and SATA SSDs
Complete service catalog
Ready to recover your VMware environment?
Free evaluation. No data = no charge. Mail-in from anywhere in the U.S.