Skip to main contentSkip to navigation
Rossmann Repair Group logo - data recovery and MacBook repair
Enterprise Hypervisor Recovery

VMware ESXi Purple Screen of Death (PSOD) Data Recovery

A PSOD is an ESXi kernel panic. The host is down. Your VMs are inaccessible. The instinct to reboot and rebuild the RAID will make it worse. We image every drive in the array read-only, reconstruct the RAID offline, parse the VMFS volume, and extract your VMDKs. Free evaluation. No data = no charge.

Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated March 2026
12 min read

What Happens to Your Data During a PSOD

The Purple Screen of Death halts the ESXi vmkernel, freezing all I/O to the underlying storage. VMs stop mid-write. VMFS journal entries are left uncommitted. If the PSOD was triggered by a storage controller failure, the RAID array itself may be degraded with one or more drives offline. The data is still on the platters or NAND, but accessing it requires bypassing the dead hypervisor entirely and working at the physical storage layer.

Rebooting the host and allowing ESXi to auto-rebuild a degraded array is the most common path to permanent data loss. The rebuild reads every sector of every surviving drive; a single Unrecoverable Read Error on any member causes the entire rebuild to abort, and now parity is destroyed. The correct response is to power down the host, remove the drives, and image each drive individually using hardware-level forensic tools before touching the array configuration.

PSOD Causes: Hardware MCE vs. Software Panics

The PSOD screen contains diagnostic data that determines whether the host experienced a physical hardware failure or a software bug. The distinction matters because hardware faults require physical intervention on the storage, while software panics may allow a clean reboot if the underlying disks are healthy.

Machine Check Exceptions (MCE)

An MCE is a hardware interrupt generated by the CPU when it detects an uncorrectable error in the cache hierarchy, memory controller, or I/O subsystem. The PSOD screen displays theMCi_STATUSregister value as a hex code. The lower 16 bits encode the error type: cache errors, TLB errors, bus/interconnect errors, or memory controller errors. An MCE-triggered PSOD indicates a physical fault in the server hardware. The CPU, DIMMs, or motherboard VRMs need diagnosis before the host can be safely restarted.

Non-Maskable Interrupts (NMI)

An NMI is a hardware interrupt that cannot be deferred by the CPU. ESXi triggers a PSOD on NMI when a PCIe device (storage controller, NIC, GPU passthrough) reports an uncorrectable error via the PCIe Advanced Error Reporting (AER) mechanism, or when a hardware watchdog timer expires because the vmkernel stopped responding. Storage controller NMIs are common when a RAID card loses its battery-backed cache or a SAS expander encounters a PHY link error cascade.

Software Kernel Panics

Not all PSODs originate from hardware. ESXi 8.0.1 hosts can experience a PSOD with #PF Exception 14 during intensive I/O on VMFS-6 volumes, caused by a memory race condition in theres3HelperQuorFSUnmapManagworlds. This is a software bug, not a drive failure. The fix is upgrading to ESXi 8.0 Update 2 or later. If the underlying storage is healthy, data should be accessible after the patch. Verify drive health with SMART before rebooting.

HBA Firmware Corruption

Certain QLogic QLE269x HBA firmware versions contain a bug that replays stale I/O requests to freed memory locations. This corrupts the DMA heap, which propagates silent write errors into VMFS-6 metadata on disk before eventually triggering a host PSOD. The danger: the VMFS corruption happened before the crash, not during it. VOMA may report allocation bitmap inconsistencies that predate the PSOD event. Recovery requires manual reconstruction of the VMDK descriptor chains from the raw VMFS volume.

TPM 2.0 Lockout After Motherboard Replacement

Starting with vSphere 7.0 Update 2, ESXi seals its host configuration to the physical TPM 2.0 chip on the motherboard. This means the ESXi configuration partition, local user database, and encryption keys are cryptographically bound to one specific piece of silicon.

If an MCE-triggered PSOD requires motherboard replacement, the new board's TPM will not match the sealed configuration. ESXi will boot into a permanent “Security Violation” PSOD loop. Nuvoton (NTC) and NationZ (NTZ) TPM chips will enter Dictionary Attack lockout mode after repeated failed attestation attempts.

Immediate Steps After an ESXi PSOD

The first 30 minutes after a PSOD determine whether the data is recoverable. Follow this sequence:

  1. Photograph the purple screen. Capture the full PSOD output, including the error code, faulting module name, and any MCi_STATUS hex values. This information identifies whether the fault is hardware or software.
  2. Do not reboot into automatic RAID rebuild. If the PSOD was caused by a storage controller failure, one or more drives may be marked offline. Rebooting allows the controller to trigger a rebuild on the degraded array. A rebuild on physically failing drives will destroy parity data needed for recovery.
  3. Extract the vmkernel core dump if possible. If the host reboots (some PSODs auto-reboot after a timeout), extract the vmkernel-zdump file via SSH or the DCUI. This dump contains the full register state and stack trace for root cause analysis.
  4. Check RAID controller status before any storage operations. Log into the RAID controller management interface (iDRAC, iLO, MegaCLI, storcli) and document which drives are online, offline, rebuilding, or foreign. Do not clear foreign configurations.
  5. Power down and ship the drives for imaging. If any physical drive has failed, or if the PSOD recurs on reboot, power down the host. Label each drive with its bay/slot number. Ship them for read-only forensic imaging.

How We Recover Data After an ESXi PSOD

Our recovery process works at the physical storage layer, bypassing the dead or unstable ESXi host entirely. No RAID rebuilds. No filesystem repair utilities on degraded media.

1. Per-Drive Forensic Imaging

Each drive from the array is connected to a PC-3000 Express (for SATA/SAS drives) or PC-3000 Portable III (for NVMe). We create a sector-by-sector clone, applying head maps to skip unstable regions on failing heads. Drives with firmware corruption get terminal-level access to rebuild the translator module before imaging. The original drives are never written to.

2. Offline RAID Reconstruction

Using the cloned images, we reconstruct the RAID array offline. The RAID metadata (DDF headers, Dell PERC configuration blocks, HP Smart Array metadata, mdadm superblocks) is parsed from the drive images to determine stripe size, rotation direction, parity distribution, and member ordering. No hardware RAID controller is needed; reconstruction happens entirely in software from the forensic images.

3. VMFS Volume Parsing

With the virtual RAID volume reconstructed, we parse the VMFS-5 or VMFS-6 metadata structures: the volume header, resource bitmap, file descriptor heap, and pointer block chains. VMFS uses 1MB blocks with sub-block allocation for small files. Each .vmdk descriptor file references its flat extent data through pointer blocks. We locate every VMDK and its associated delta files (snapshots), even if the VMFS allocation bitmap is partially corrupted.

4. VMDK Extraction and Delivery

Recovered VMDKs are validated by mounting them read-only and verifying the guest filesystem (NTFS, ext4, XFS) is consistent. If the VM had active snapshots, we reconstruct the snapshot chain by reading grain tables (VMFS sparse) or SE sparse extent maps and consolidating the delta writes into a single flat image representing the VM's last consistent state. Delivered on encrypted external media or via secure transfer.

ESXi Storage Media: Recovery Differences by Drive Type

Modern ESXi hosts use a mix of NVMe SSDs, SAS HDDs, and SATA SSDs. Each type presents different failure modes and recovery constraints. Conflating them leads to wrong procedures and lost data.

NVMe SSDs (PCIe-Attached)

Enterprise NVMe drives (Intel P5510, Samsung PM9A3, Micron 7450) use U.2 or E1.S/E3.S form factors with BGA controller and NAND packages soldered to the PCB. The controller encrypts the FTL mapping table and binds it to the specific controller silicon, so swapping the PCB to a donor board does not restore data access. If the NVMe controller fails, the drive requires PC-3000 SSD with the NVMe adapter to access the controller at the register level and attempt FTL reconstruction. TRIM/UNMAP on VMFS-6 is enabled by default for NVMe datastores; deleted VM data on flash is unrecoverable once TRIM executes.

SAS HDDs (Enterprise Spinners)

Enterprise SAS drives (Seagate Exos, Toshiba AL series, WD Ultrastar) in RAID arrays are the most common storage behind ESXi datastores. Head failures, firmware corruption, and Unrecoverable Read Errors (UREs) are the primary failure modes. Recovery follows standard HDD procedures: clean bench head swap from a matched donor, PC-3000 Express for firmware repair, and DeepSpar Disk Imager for sector-level stabilization. The 0.02 micron ULPA-filtered bench handles all open-drive work.

SATA SSDs (vSAN / Boot Media)

SATA SSDs used as vSAN cache tiers or ESXi boot devices use standard 2.5-inch form factors but share the same controller encryption constraints as NVMe. A failed Phison, Silicon Motion, or Samsung controller locks the FTL. PC-3000 SSD provides direct controller access for supported chipsets. SATA SSDs do not require clean bench work; there are no mechanical components to contaminate.

ESXi PSOD Recovery Pricing

Pricing is per-drive, based on the physical condition of each member in the array. No diagnostic fees. No data recovered = no charge.

Service TierPriceDescription
Simple CopyLow complexity$100

Your drive works, you just need the data moved off it

Functional drive; data transfer to new media

Rush available: +$100

File System RecoveryLow complexityFrom $250

Your drive isn't recognized by your computer, but it's not making unusual sounds

File system corruption. Accessible with professional recovery software but not by the OS

Starting price; final depends on complexity

Firmware RepairMedium complexity – PC-3000 required$600–$900

Your drive is completely inaccessible. It may be detected but shows the wrong size or won't respond

Firmware corruption: ROM, modules, or translator tables corrupted; requires PC-3000 terminal access

Standard drives at lower end; high-density drives at higher end

Head SwapHigh complexity – clean bench surgery50% deposit$1,200–$1,500

Your drive is clicking, beeping, or won't spin. The internal read/write heads have failed

Head stack assembly failure. Transplanting heads from a matching donor drive on a clean bench

50% deposit required. Donor parts are consumed in the repair

Surface / Platter DamageHigh complexity – clean bench surgery50% deposit$2,000

Your drive was dropped, has visible damage, or a head crash scraped the platters

Platter scoring or contamination. Requires platter cleaning and head swap

50% deposit required. Donor parts are consumed in the repair. Most difficult recovery type.

Hardware Repair vs. Software Locks

Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.

All tiers: Free evaluation and firm quote before any paid work. No data, no fee on simple copy, file system, and firmware tiers. Head swap and surface damage require a 50% deposit because donor parts are consumed in the attempt.

Target drive: The destination drive we copy recovered data onto. You can supply your own or we provide one at cost. For ultra-high-capacity drives (20TB and above), the target drive costs approximately $400+ due to the large media required. All prices are plus applicable tax.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to make sure your hard drive is handled safely and properly. This approach allows us to serve clients nationwide with consistent technical standards.

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

LR

Louis Rossmann

Louis Rossmann's well trained staff review our lab protocols to ensure technical accuracy and honest service. Since 2008, his focus has been on clear technical communication and accurate diagnostics rather than sales-driven explanations.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video

What Server & RAID Recovery Customers Say

4.9 across 1,837+ verified Google reviews
Had a raid 0 array (windows storage pool) (failed 2tb Seagate, and a working 1tb wd blue) recovered last year, it was much cheaper than the $1500 to $3500 Canadian dollars i was quoted by a Canadian data recovery service. the price while expensive was a comparatively reasonable $900USD (about $1100 CAD at the time). they had very good communication with me about the status of my recovery and were extremely professional. the drive they sent back was Very well packaged. I would 100% have a drive recovered by them again if i ever needed to again.
ChristopolisSeagate
View on Google
HIGHLIGHT & CONCLUSION ******Overall I'm having a good experience with this store because they have great customer services, best third party replacement parts, justify price for those replacement parts, short estimate waiting time to fix the device, 1 year warranty, and good prediction of pricing and the device life conditions whether it can fix it or not.
Yuong Huao Ng LiangiPhone
View on Google
Didn't *fix* my issue but a great experience. Shipped a drive from an old NAS whose board had failed. Rossmann Repair wanted to go straight for data extraction (~$600-900). Did some research on my own and discovered the file table was Linux based and asked if they could take a look. They said that their decision still stands and would only go straight for data recovery.
Mac Hancock
View on Google
I've been following the YouTube tutorials since my family and I were in India on business. My son spilled Geteraid on my keyboard and my computer wouldn't come on after I opened it and cleaned it, laying it upside down for a week. To make the story short I took my computer to the shop while I'm in New York on business and did charged me $45.00 for a rush assessment.
Rudy GonzalezMacBook Air
View on Google

ESXi PSOD Recovery: Common Questions

What causes a Purple Screen of Death (PSOD) on VMware ESXi?
A PSOD is the ESXi kernel panic equivalent. The most common hardware triggers are Machine Check Exceptions (MCE) from uncorrectable CPU cache or memory controller errors, and Non-Maskable Interrupts (NMI) from PCIe device failures or watchdog timeouts. Software triggers include VMFS-6 metadata race conditions, faulty storage HBA driver modules, and out-of-heap-memory conditions in the vmkernel. The PSOD screen displays the faulting module, an error code, and register dump. The MCi_STATUS hex code on the screen identifies whether the fault is a physical hardware failure or a software panic.
Can I swap the ESXi host motherboard to fix a hardware-triggered PSOD?
On ESXi 7.0 Update 2 and later, replacing the motherboard triggers a TPM 2.0 Security Violation PSOD if the host configuration was sealed to the original TPM chip. The ESXi configuration is cryptographically bound to the physical TPM. Without the encryptionRecoveryKey (exported during initial setup), the replacement board will enter a permanent PSOD loop on every boot. The correct approach is to export the recovery key before any motherboard swap, or extract data directly from the VMFS volumes on the underlying storage without booting the host.
Should I run VOMA or filesystem repair tools on VMFS after a PSOD?
The vSphere On-disk Metadata Analyzer (VOMA) is read-only and cannot repair what it detects. Running VOMA on a volume backed by physically failing drives generates additional I/O that can worsen the failure. Never run chkdsk, fsck, or any write-mode filesystem repair on a VMFS datastore when the underlying RAID array has degraded drives. These utilities assume healthy physical media and will overwrite metadata structures needed for forensic reconstruction. Image the drives read-only first.
Should I rebuild the RAID array after a PSOD caused by a drive failure?
Rebuilding a degraded RAID 5 or RAID 6 array after a PSOD frequently causes permanent data loss in enterprise environments. The rebuild reads every sector of every surviving drive to recalculate parity. On large drives (4TB+), the probability of hitting an Unrecoverable Read Error during this full-disk sequential read is high enough that the rebuild itself can fail, taking the entire array offline. Image all drives individually before attempting any rebuild. Per-drive imaging costs start at $250; a failed rebuild costs everything.
Can deleted VMDKs be recovered from VMFS-6 on flash storage?
If the guest OS issued a delete and VMFS-6 passed the SCSI UNMAP command to the underlying SSD or flash array, the controller marks those LBAs for garbage collection. Reads to unmapped LBAs return zeros to software, and the NAND is physically erased during the controller's next garbage collection cycle. Once GC completes, the deleted VMDK data is unrecoverable. If UNMAP is disabled on the VMFS-6 datastore (not the default), or the underlying storage is spinning disk, recovery of deleted VMDKs depends on whether the freed VMFS blocks have been reallocated to other files.
How much does data recovery after an ESXi PSOD cost?
Pricing follows the same transparent per-drive model as all our services. Each drive in the array is assessed individually: $250+ for filesystem-level issues, $600-$900 for firmware repair, $1,200-$1,500 for mechanical failure requiring head swap. No data recovered = no charge.

ESXi Host Down? Ship the Drives.

Free evaluation. Per-drive imaging. No data = no fee. We reconstruct the RAID and extract your VMDKs without booting the host.