Enterprise Hypervisor Recovery

VMware ESXi Purple Screen of Death Data Recovery

A PSOD is an ESXi kernel panic. The host is down. Your VMs are inaccessible. The instinct to reboot and rebuild the RAID will make it worse. We image every drive in the array read-only, reconstruct the RAID offline, parse the VMFS volume, and extract your VMDKs. Free evaluation. No data = no charge.

Get Free Evaluation Ship Your Drives

Author01/09

Written by

Louis Rossmann

Founder & Chief Technician

Updated March 2026

12 min read

Section 0302/09

Featured Snippet Target03/09

What Happens to Your Data During a Purple Screen of Death

The Purple Screen of Death halts the ESXi vmkernel, freezing all I/O to the underlying storage. VMs stop mid-write. VMFS journal entries are left uncommitted. If the PSOD was triggered by a storage controller failure, the RAID array itself may be degraded with one or more drives offline. The data is still on the platters or NAND, but accessing it requires bypassing the dead hypervisor entirely and working at the physical storage layer.

Rebooting the host and allowing ESXi to auto-rebuild a degraded array is the most common path to permanent data loss. The rebuild reads every sector of every surviving drive; a single Unrecoverable Read Error on any member causes the entire rebuild to abort, and now parity is destroyed. The correct response is to power down the host, remove the drives, and image each drive individually using hardware-level forensic tools before touching the array configuration.

PSOD Causes: Hardware vs Software04/09

Purple Screen of Death Causes: Hardware Machine Check Exceptions vs. Software Panics

The PSOD screen contains diagnostic data that determines whether the host experienced a physical hardware failure or a software bug. The distinction matters because hardware faults require physical intervention on the storage, while software panics may allow a clean reboot if the underlying disks are healthy.

Machine Check Exceptions

An MCE is a hardware interrupt generated by the CPU when it detects an uncorrectable error in the cache hierarchy, memory controller, or I/O subsystem. The PSOD screen displays theMCi_STATUSregister value as a hex code. The lower 16 bits encode the error type: cache errors, TLB errors, bus/interconnect errors, or memory controller errors. An MCE-triggered PSOD indicates a physical fault in the server hardware. The CPU, DIMMs, or motherboard VRMs need diagnosis before the host can be safely restarted.

Non-Maskable Interrupts

An NMI is a hardware interrupt that cannot be deferred by the CPU. ESXi triggers a PSOD on NMI when a PCIe device (storage controller, NIC, GPU passthrough) reports an uncorrectable error via the PCIe Advanced Error Reporting (AER) mechanism, or when a hardware watchdog timer expires because the vmkernel stopped responding. Storage controller NMIs are common when a RAID card loses its battery-backed cache or a SAS expander encounters a PHY link error cascade.

Software Kernel Panics

Not all PSODs originate from hardware. ESXi 8.0.1 hosts can experience a PSOD with #PF Exception 14 during intensive I/O on VMFS-6 volumes, caused by a memory race condition in theres3HelperQuorFSUnmapManagworlds. This is a software bug, not a drive failure. The fix is upgrading to ESXi 8.0 Update 2 or later. If the underlying storage is healthy, data should be accessible after the patch. Verify drive health with SMART before rebooting.

HBA Firmware Corruption

Certain QLogic QLE269x HBA firmware versions contain a bug that replays stale I/O requests to freed memory locations. This corrupts the DMA heap, which propagates silent write errors into VMFS-6 metadata on disk before eventually triggering a host PSOD. The danger: the VMFS corruption happened before the crash, not during it. VOMA may report allocation bitmap inconsistencies that predate the PSOD event. Recovery requires manual reconstruction of the VMDK descriptor chains from the raw VMFS volume.

TPM 2.0 Warning05/09

TPM 2.0 Lockout After Motherboard Replacement

Starting with vSphere 7.0 Update 2, ESXi seals its host configuration to the physical TPM 2.0 chip on the motherboard. This means the ESXi configuration partition, local user database, and encryption keys are cryptographically bound to one specific piece of silicon.

If an MCE-triggered PSOD requires motherboard replacement, the new board's TPM will not match the sealed configuration. ESXi will boot into a permanent “Security Violation” PSOD loop. Nuvoton (NTC) and NationZ (NTZ) TPM chips will enter Dictionary Attack lockout mode after repeated failed attestation attempts.

Before Swapping Hardware

If TPM-based configuration encryption was enabled, export the encryptionRecoveryKey before any motherboard swap. Without this key, the host configuration is unrecoverable on replacement hardware. If the key was never exported, the data must be recovered directly from the VMFS volumes on the underlying storage, bypassing the ESXi boot process entirely. We image the RAID members, reconstruct the array offline, and extract VMDKs from the raw VMFS structures without requiring the host to boot.

Emergency Response Steps06/09

Immediate Steps After an ESXi Purple Screen of Death

The first 30 minutes after a PSOD determine whether the data is recoverable. Follow this sequence:

Photograph the purple screen. Capture the full PSOD output, including the error code, faulting module name, and any MCi_STATUS hex values. This information identifies whether the fault is hardware or software.
Do not reboot into automatic RAID rebuild. If the PSOD was caused by a storage controller failure, one or more drives may be marked offline. Rebooting allows the controller to trigger a rebuild on the degraded array. A rebuild on physically failing drives will destroy parity data needed for recovery.
Extract the vmkernel core dump if possible. If the host reboots (some PSODs auto-reboot after a timeout), extract the vmkernel-zdump file via SSH or the DCUI. This dump contains the full register state and stack trace for root cause analysis.
Check RAID controller status before any storage operations. Log into the RAID controller management interface (iDRAC, iLO, MegaCLI, storcli) and document which drives are online, offline, rebuilding, or foreign. Do not clear foreign configurations.
Power down and ship the drives for imaging. If any physical drive has failed, or if the PSOD recurs on reboot, power down the host. Label each drive with its bay/slot number. Ship them for read-only forensic imaging.

Recovery Process07/09

How We Recover Data After an ESXi Purple Screen of Death

Our recovery process works at the physical storage layer, bypassing the dead or unstable ESXi host entirely. No RAID rebuilds. No filesystem repair utilities on degraded media. A PSOD is a server-level failure, so the drives are handled through our enterprise server data recovery workflow rather than any in-host repair tool.

1. Per-Drive Forensic Imaging

Each drive from the array is connected to a PC-3000 Express (for SATA/SAS drives) or PC-3000 Portable III (for NVMe). We create a sector-by-sector clone, applying head maps to skip unstable regions on failing heads. Drives with firmware corruption get terminal-level access to rebuild the translator module before imaging. The original drives are never written to.

2. Offline RAID Reconstruction

Using the cloned images, we reconstruct the RAID array offline. The RAID metadata (DDF headers, Dell PERC configuration blocks, HP Smart Array metadata, mdadm superblocks) is parsed from the drive images to determine stripe size, rotation direction, parity distribution, and member ordering. No hardware RAID controller is needed; reconstruction happens entirely in software from the forensic images.

3. VMFS Volume Parsing

With the virtual RAID volume reconstructed, we parse the VMFS-5 or VMFS-6 metadata structures: the volume header, resource bitmap, file descriptor heap, and pointer block chains. VMFS uses 1MB blocks with sub-block allocation for small files. Each .vmdk descriptor file references its flat extent data through pointer blocks. We locate every VMDK and its associated delta files (snapshots), even if the VMFS allocation bitmap is partially corrupted.

4. VMDK Extraction and Delivery

Recovered VMDKs are validated by mounting them read-only and verifying the guest filesystem (NTFS, ext4, XFS) is consistent. If the VM had active snapshots, we reconstruct the snapshot chain by reading grain tables (VMFS sparse) or SE sparse extent maps and consolidating the delta writes into a single flat image representing the VM's last consistent state. Delivered on encrypted external media or via secure transfer.

NVMe vs SAS vs SATA08/09

ESXi Storage Media: Recovery Differences by Drive Type

Modern ESXi hosts use a mix of NVMe SSDs, SAS HDDs, and SATA SSDs. Each type presents different failure modes and recovery constraints. Conflating them leads to wrong procedures and lost data.

PCIe-Attached NVMe SSDs

Enterprise NVMe drives (Intel P5510, Samsung PM9A3, Micron 7450) use U.2 or E1.S/E3.S form factors with BGA controller and NAND packages soldered to the PCB. The controller encrypts the FTL mapping table and binds it to the specific controller silicon, so swapping the PCB to a donor board does not restore data access. If the NVMe controller fails, the drive requires PC-3000 SSD with the NVMe adapter to access the controller at the register level and attempt FTL reconstruction. TRIM/UNMAP on VMFS-6 is enabled by default for NVMe datastores; deleted VM data on flash is unrecoverable once TRIM executes.

Enterprise SAS HDDs

Enterprise SAS drives (Seagate Exos, Toshiba AL series, WD Ultrastar) in RAID arrays are the most common storage behind ESXi datastores. Head failures, firmware corruption, and Unrecoverable Read Errors (UREs) are the primary failure modes. Recovery follows standard HDD procedures: clean bench head swap from a matched donor, PC-3000 Express for firmware repair, and DeepSpar Disk Imager for sector-level stabilization. The 0.02 micron ULPA-filtered clean bench handles all open-drive work.

SATA SSDs for vSAN and Boot Media

SATA SSDs used as vSAN cache tiers or ESXi boot devices use standard 2.5-inch form factors but share the same controller encryption constraints as NVMe. A failed Phison, Silicon Motion, or Samsung controller locks the FTL. PC-3000 SSD provides direct controller access for supported chipsets. SATA SSDs do not require clean bench work; there are no mechanical components to contaminate.

Pricing09/09

ESXi PSOD Recovery Pricing

Pricing is per-drive, based on the physical condition of each member in the array. No diagnostic fees. No data recovered = no charge.

Low complexity
Simple Copy
Your drive works, you just need the data moved off it
Functional drive; data transfer to new media
Rush available: +$100
$100
3-5 business days
Low complexity
File System Recovery
Your drive isn't recognized by your computer, but it's not making unusual sounds
File system corruption. Accessible with professional recovery software but not by the OS
Starting price; final depends on complexity
From $250
2-4 weeks
Medium complexity
Firmware Repair
Your drive is completely inaccessible. It may be detected but shows the wrong size or won't respond
Firmware corruption: ROM, modules, or translator tables corrupted; requires PC-3000 terminal access
CMR drive: $600. SMR drive: $900.
$600–$900
3-6 weeks
High complexity
Most Common
Head Swap
Your drive is clicking, beeping, or won't spin. The internal read/write heads have failed
Head stack assembly failure. Transplanting heads from a matching donor drive on a clean bench
50% deposit required. CMR: $1,200-$1,500 + donor. SMR: $1,500 + donor.
50% deposit required
$1,200–$1,500
4-8 weeks
High complexity
Surface / Platter Damage
Your drive was dropped, has visible damage, or a head crash scraped the platters
Platter scoring or contamination. Requires platter cleaning and head swap
50% deposit required. Donor parts are consumed in the repair. Most difficult recovery type.
50% deposit required
$2,000
4-8 weeks

Hardware Repair vs. Software Locks

Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.

No data, no fee. Free evaluation and firm quote before any paid work. Full guarantee details. Head swap and surface damage require a 50% deposit because donor parts are consumed in the attempt.

Rush fee: +$100 rush fee to move to the front of the queue
Donor drives: Donor drives are matching drives used for parts. Typical donor cost: $50–$150 for common drives, $200–$400 for rare or high-capacity models. We source the cheapest compatible donor available.
Target drive: The destination drive we copy recovered data onto. You can supply your own or we provide one at cost plus a small markup. For larger capacities (8TB, 10TB, 16TB and above), target drives cost $400+ extra. All prices are plus applicable tax.

The prices above are for standard hard drives, which covers most jobs. Helium-sealed drives (for example WD or HGST Ultrastar He and Seagate Exos X) must be resealed and refilled with helium in-house after the chamber is opened, so they price higher, in the $200–$5,000+ range. See helium drive pricing.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to maintain drive integrity. This approach allows us to serve clients nationwide with consistent technical standards.

Validated Clean Zone

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

Technical Oversight

Louis Rossmann

Our engineers review all lab protocols to maintain technical accuracy and honest service. Since 2008, his focus has been on clear technical communication and accurate diagnostics rather than sales-driven explanations.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video

Verified on Google

What Server & RAID Recovery Customers Say

4.9 / 51,837 Google reviewsverify on Google Maps

April 9, 2024

“Had a raid 0 array (windows storage pool) (failed 2tb Seagate, and a working 1tb wd blue) recovered last year, it was much cheaper than the $1500 to $3500 Canadian dollars i was quoted by a Canadian data recovery service. the price while expensive was a comparatively reasonable $900USD (about $1100 CAD at the time). they had very good communication with me about the status of my recovery and were extremely professional. the drive they sent back was Very well packaged. I would 100% have a drive recovered by them again if i ever needed to again.”

Christopolis

Seagate

View on Google

August 13, 2019

“HIGHLIGHT & CONCLUSION ******Overall I'm having a good experience with this store because they have great customer services, best third party replacement parts, justify price for those replacement parts, short estimate waiting time to fix the device, 1 year warranty, and good prediction of pricing and the device life conditions whether it can fix it or not.”

Yuong Huao Ng Liang

iPhone

View on Google

November 17, 2025

“Didn't *fix* my issue but a great experience. Shipped a drive from an old NAS whose board had failed. Rossmann Repair wanted to go straight for data extraction (~$600-900). Did some research on my own and discovered the file table was Linux based and asked if they could take a look. They said that their decision still stands and would only go straight for data recovery.”

Mac Hancock

View on Google

November 7, 2015

“I've been following the YouTube tutorials since my family and I were in India on business. My son spilled Geteraid on my keyboard and my computer wouldn't come on after I opened it and cleaned it, laying it upside down for a week. To make the story short I took my computer to the shop while I'm in New York on business and did charged me $45.00 for a rush assessment.”

Rudy Gonzalez

MacBook Air

View on Google

Read all 1,837+ reviews

ESXi PSOD Recovery: Common Questions

What causes a Purple Screen of Death on VMware ESXi?

A PSOD is the ESXi kernel panic equivalent. The most common hardware triggers are Machine Check Exceptions (MCE) from uncorrectable CPU cache or memory controller errors, and Non-Maskable Interrupts (NMI) from PCIe device failures or watchdog timeouts. Software triggers include VMFS-6 metadata race conditions, faulty storage HBA driver modules, and out-of-heap-memory conditions in the vmkernel. The PSOD screen displays the faulting module, an error code, and register dump. The MCi_STATUS hex code on the screen identifies whether the fault is a physical hardware failure or a software panic.

Can I swap the ESXi host motherboard to fix a hardware-triggered Purple Screen of Death?

On ESXi 7.0 Update 2 and later, replacing the motherboard triggers a TPM 2.0 Security Violation PSOD if the host configuration was sealed to the original TPM chip. The ESXi configuration is cryptographically bound to the physical TPM. Without the encryptionRecoveryKey (exported during initial setup), the replacement board will enter a permanent PSOD loop on every boot. The correct approach is to export the recovery key before any motherboard swap, or extract data directly from the VMFS volumes on the underlying storage without booting the host.

Should I run VOMA or filesystem repair tools on VMFS after a PSOD?

The vSphere On-disk Metadata Analyzer (VOMA) has a fix mode that can attempt VMFS metadata repair, but it assumes the underlying physical storage is healthy. Running VOMA in fix mode against a volume backed by physically failing drives generates write I/O that can worsen the failure. Never run chkdsk, fsck, or any write-mode filesystem repair on a VMFS datastore when the underlying RAID array has degraded drives. These utilities assume healthy physical media and will overwrite metadata structures needed for forensic reconstruction. Image the drives read-only first.

Should I rebuild the RAID array after a PSOD caused by a drive failure?

Rebuilding a degraded RAID 5 or RAID 6 array after a PSOD frequently causes permanent data loss in enterprise environments. The rebuild reads every sector of every surviving drive to recalculate parity. On large drives (4TB+), the probability of hitting an Unrecoverable Read Error during this full-disk sequential read is high enough that the rebuild itself can fail, taking the entire array offline. Image all drives individually before attempting any rebuild. Per-drive imaging costs start at $250; a failed rebuild costs everything.

Can deleted VMDKs be recovered from VMFS-6 on flash storage?

If the guest OS issued a delete and VMFS-6 passed the SCSI UNMAP command to the underlying SSD or flash array, the controller marks those LBAs for garbage collection. Reads to unmapped LBAs return zeros to software, and the NAND is physically erased during the controller's next garbage collection cycle. Once GC completes, the deleted VMDK data is unrecoverable. If UNMAP is disabled on the VMFS-6 datastore (not the default), or the underlying storage is spinning disk, recovery of deleted VMDKs depends on whether the freed VMFS blocks have been reallocated to other files.

How much does data recovery after an ESXi Purple Screen of Death cost?

Pricing follows the same transparent per-drive model as all our services. Each drive in the array is assessed individually: From $250 for filesystem-level issues, $600–$900 for firmware repair, $1,200–$1,500 for mechanical failure requiring head swap. No data recovered = no charge.

No Data, No Fee

Guarantee

2.49M+

Subscribers

4.9

1,837+ Google Reviews

Since 2008

Established

Repairs on Video

Full Transparency

As Featured In

BBC News