Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 5 Months, 21 DaysFacility Status: Fully Operational & Accepting New Cases
Rossmann Repair Group logo - data recovery and MacBook repair

Pool Recovery & RAIDZ Reconstruction

ZFS Data Recovery

We recover faulted ZFS pools by imaging every drive, parsing vdev labels and uberblocks, and importing the pool offline from cloned images. Covers TrueNAS, FreeNAS, Proxmox, Oracle Solaris, and Linux OpenZFS. Free evaluation. No data = no charge.

Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated March 2026
16 min read

How ZFS Pools Fail and How We Reconstruct Them

ZFS pool recovery requires imaging every member drive, locating valid uberblocks in the 128-entry ring on each drive's ZFS labels, and importing the pool read-only from cloned images to extract datasets and zvols. Pools transition to FAULTED when enough drives in a vdev fail that ZFS cannot reconstruct missing blocks from parity.

ZFS stores all data and metadata in a Merkle tree rooted at the uberblock. Each pool contains one or more vdevs (mirror, RAIDZ1, RAIDZ2, or RAIDZ3), and each vdev distributes data across its member drives using variable-width stripes.

ZFS checksums every block using SHA-256 (or fletcher4 for non-dedup pools). This per-block verification catches silent corruption thattraditional RAID controllers miss. The trade-off: when ZFS detects a checksum mismatch and cannot correct it from parity, it returns an I/O error instead of serving corrupt data. Scrub errors on a degraded pool signal that more blocks will become inaccessible if another drive fails.

ZFS On-Disk Architecture for Recovery Engineers

Targeted ZFS recovery requires understanding five on-disk structures that standard file-carving tools do not handle. Each structure has a specific role in the pool's self-describing metadata tree, and corruption at any level produces different symptoms. These five structures are vdev labels, the uberblock ring, DVA block pointers, dnode objects, and space maps.

Vdev Labels (L0, L1, L2, L3)

Every drive in a ZFS pool carries four label copies: L0 and L1 at the first 512 KB, L2 and L3 at the last 512 KB. Each label contains the pool GUID, vdev GUID, the full vdev tree encoded as an nvlist, and the uberblock ring (128 entries, 1 KB each). ZFS writes labels in a specific order to prevent all four from being corrupted by a single interrupted write. During recovery, we read all four labels from every drive image to find the most complete vdev tree and the highest-txg uberblock.

Uberblock Ring and Transaction Groups

The uberblock is the root pointer for the entire pool. ZFS maintains a ring buffer of 128 uberblocks per label, written round-robin as each transaction group (TXG) commits. Each uberblock records the TXG number, a timestamp, and a block pointer to the Meta Object Set (MOS). The active uberblock is the one with the highest TXG that also has a valid checksum.

When the latest TXG is corrupted, we walk backward through the ring to find an older, consistent state. The trade-off: rolling back to an earlier TXG means any data written after that TXG is lost. For a pool that saw 10 TXGs of write activity before the failure, that window is usually seconds to minutes of data.

DVA Pointers and the Block Pointer Tree

Every ZFS block pointer contains up to three Data Virtual Addresses (DVAs). Each DVA encodes the vdev ID, offset within the vdev, and the gang bit (indicating whether the block is a gang block split across multiple sub-blocks). The block pointer also stores the checksum of the target block, the compression algorithm, the logical and physical sizes, and the birth TXG. We use zdb -bbb on the imported pool image to traverse the full block pointer tree. When DVA pointers reference sectors on a failed drive, we reconstruct the missing data from RAIDZ parity (if within tolerance) or flag those blocks as unrecoverable.

Dnode Objects and the Meta Object Set

The MOS is the top-level object set containing all pool-wide metadata: the dataset directory, the space map, the DDT (if dedup is enabled), and configuration objects. Each dataset within the pool has its own object set, and within that set, every file or directory is represented by a dnode. A dnode is a 512-byte structure that stores the object type, bonus data (such as the ZPL file attributes), and up to three block pointers for the object's data. When MOS corruption prevents normal import, we locate the MOS block pointer directly from the uberblock and traverse the dnode tree manually using zdb -dddd to enumerate datasets.

Space Maps and Free Space Tracking

ZFS tracks allocated and free space using space maps: on-disk logs of allocation and free events for each metaslab. Corrupted space maps do not lose user data, but they prevent ZFS from mounting the pool because it cannot determine which blocks are in use. Recovery involves bypassing the space map check during read-only import and rebuilding the space map from the block pointer tree. This is a metadata-only repair that does not modify user data blocks.

Raw Block Extraction via objset_phys_t

When catastrophic MOS corruption prevents any form of pool import, we fall back to raw block extraction. The objset_phys_t structure is the root of every dataset in ZFS; it contains an array of dnode_phys_t entries, each holding block pointers (blkptr_t) to the actual file data. Using zdb -R pool:vdev:offset:psize:d,lzjb,lsize, we read raw compressed data blocks directly from drive images, decompress them with the correct algorithm (LZ4, LZJB, or ZSTD), and reconstruct individual files by walking the dnode tree manually. This bypasses the entire ZFS import pipeline. It's the last-resort method for severely corrupted pools on custom-built NAS enclosures where no valid uberblock survives on any drive.

RAIDZ Parity Distribution and Rebuild Constraints

RAIDZ uses variable-width stripes where each logical block is distributed across a stripe whose width depends on the block's size and the number of drives in the vdev. This eliminates the RAID write hole without a battery-backed cache, but recovery tools designed for fixed-stripe RAID arrays cannot parse RAIDZ data.
RAIDZ LevelParity DrivesFault ToleranceRecovery When Exceeded
RAIDZ11 per stripe1 drive per vdevBlocks on 2+ failed drives are unrecoverable from parity alone. Partial recovery based on which blocks landed on which drives.
RAIDZ22 per stripe2 drives per vdevBlocks spanning 3+ failed drives lost. RAIDZ2 is the most common production configuration and offers a better recovery margin than RAIDZ1.
RAIDZ33 per stripe3 drives per vdevRarely exceeds tolerance in practice. Typically seen in large vdevs (8+ drives) where the probability of three simultaneous failures is non-trivial during resilver.

For traditional hardware RAID arrays (Dell PERC, HP SmartArray, LSI MegaRAID), see our RAID data recovery service. RAIDZ is software RAID managed by ZFS and uses a different on-disk layout than controller-based arrays.

Hardware-Assisted RAIDZ Reconstruction with PC-3000

Software-only tools (ReclaiMe, UFS Explorer, DiskInternals) parse ZFS pools through the OS block device layer. When a drive has hardware-level read timeouts, firmware lockouts, or degraded heads, the OS cannot deliver clean sectors and the software tool stalls or returns garbage data. PC-3000 Data Extractor RAID Edition bypasses the OS entirely.

We bypass the OS entirely using PC-3000 Data Extractor RAID Edition, which images each drive through a direct hardware interface (SAS or SATA) with configurable read timeout, head map, and sector retry parameters. Once all drives are imaged, the RAID Edition's autodetection module identifies the RAIDZ level (1, 2, or 3), block size, and stripe shift from the raw sector data. The "RAID Member Statistics" analytical method calculates the variable-width stripe distribution by comparing block entropy patterns across drive images, which is necessary because RAIDZ stripes vary in width per-block rather than using the fixed stripe width that traditional hardware RAID controllers impose. This hardware-assisted block shift identification is the difference between a successful & failed TrueNAS data recovery when the software stack can't read the drives.

What Causes ZFS Pool Failure?

ZFS pools fail from drive failures exceeding RAIDZ parity tolerance, aborted resilvers that leave parity in a mixed state, corrupted uberblocks from power loss during TXG commits, failed ZIL/SLOG devices with uncommitted synchronous writes, accidental pool destruction via zpool destroy or labelclear, dedup table corruption, and dRAID permutation failures on OpenZFS 2.1+ systems.

Faulted Pool After Drive Failures

The most common scenario. RAIDZ1 pools fault when two drives in the same vdev fail. RAIDZ2 pools fault on three failures. We image all drives (including failed ones) and reconstruct what parity can provide. Board-level repair on electrically failed drives can restore one member and bring the pool back within tolerance.

Failed Resilver

Resilvering writes parity data to all surviving members. If a surviving drive develops bad sectors during the resilver, ZFS cannot complete the rebuild. A mid-resilver failure is dangerous because parity is partially recalculated: some stripes reflect the old layout, others reflect the new. We handle this by imaging all drives and reconciling both parity states offline.

Corrupted Uberblock or MOS

Power loss during a TXG commit can corrupt the active uberblock or the MOS it points to. ZFS normally recovers by falling back to a previous TXG, but if multiple TXGs are affected (e.g., UPS failure during a long scrub), manual uberblock selection is needed. We use zpool import -T to target a specific TXG, or parse the uberblock ring manually with zdb -lu to find the last consistent state.

ZIL / SLOG Device Failure

The ZFS Intent Log (ZIL) records synchronous write transactions. If a dedicated SLOG device (typically a fast NVMe SSD) fails, any uncommitted synchronous writes are lost. For pools where the SLOG failed during active database writes or NFS/iSCSI operations, we image both the SLOG device and the pool drives. If the SLOG contains recoverable log records, we replay them into the pool. If the SLOG is physically dead, the pool imports without those pending writes.

NVMe SLOG Firmware Lockout

TrueNAS and Proxmox administrators frequently assign Phison PS5012-E12 or PS5016-E16 based NVMe drives as dedicated SLOG devices for synchronous write caching. A sudden power loss can corrupt the Flash Translation Layer on these controllers, locking the drive to a 0-byte capacity state where the BIOS no longer detects it. The uncommitted ZIL transactions on that SLOG are lost to the operating system. We use PC-3000 Portable III to issue NVMe Vendor Specific Commands that force the controller into diagnostic mode, reconstruct the corrupted FTL mapping tables, and image the drive contents through the controller to recover uncommitted log records for replay into the pool. The same Phison controller failure modes affect consumer SSD data recovery outside the ZFS context.

Silicon Motion SM2262EN / SM2259XT Caching Drive Failures

ZFS administrators frequently assign consumer NVMe drives with Silicon Motion SM2262EN controllers (ADATA SX8200 Pro, HP EX950) or SATA drives with DRAM-less SM2259XT controllers as L2ARC or SLOG devices. A power loss can corrupt the Flash Translation Layer on these controllers, locking the drive into a 0-byte or diagnostic 2MB capacity state where the BIOS no longer detects usable storage. Standard ZFS tooling cannot access the drive in this state. We use the PC-3000 SSD utility to issue vendor-specific commands that force the controller into diagnostic mode, rebuild the corrupted FTL mapping tables from the NAND spare area, and image the drive contents to recover uncommitted ZIL transactions for replay into the pool. The same SM2259XT controller failures affect consumer SSDs outside the ZFS context.

ZIL Replay and Log-Write Block (LWB) Extraction

Synchronous writes in ZFS are stored as intent transactions (itx records) packaged into Log-Write Blocks (LWBs) within the ZFS Intent Log. When a dedicated SLOG device fails mid-commit, uncommitted LWBs are stranded on the dead NVMe drive. Unlike hardware RAID arrays with battery-backed caches, the ZIL relies entirely on the SLOG device committing itx records safely to persistent storage. We use PC-3000 Portable III to reconstruct the corrupted FTL on the failed SLOG (typically Phison E12 or SM2262EN controllers), image the raw NVMe namespace, and extract the uncommitted LWBs. The isolated itx records are then replayed into the pool's transaction groups to recover pending database writes and NFS/iSCSI operations that would otherwise be lost.

Accidental Pool Destruction

Running zpool destroy wipes labels from each drive. Running zpool labelclear does the same to individual drives. If no new data has been written afterward, the uberblocks and block pointer tree remain on disk at their original offsets. We scan for the characteristic uberblock magic number (0x00bab10c) and pool GUID to locate and reconstruct from them.

Dedup Table Corruption

ZFS deduplication stores a Dedup Table (DDT) that maps block checksums to their physical locations. The DDT consumes RAM (about 320 bytes per entry) and is backed by on-disk log-structured storage. When the DDT becomes corrupted (common when pools run low on memory under heavy dedup loads), files referencing deduplicated blocks cannot be resolved. We rebuild the DDT by scanning the full block pointer tree and reconstructing the checksum-to-DVA mappings.

dRAID Vdev Failure (OpenZFS 2.1+)

Distributed RAID (dRAID), introduced in OpenZFS 2.1 & available in TrueNAS SCALE 23.10+, replaces traditional RAIDZ hot spares with distributed spare space across all pool members. dRAID uses precomputed permutation maps to shuffle parity across drives, enabling faster sequential resilvering. Consumer recovery software (Klennet, UFS Explorer, ReclaiMe) cannot parse dRAID because the permutation map defines a non-standard block routing that doesn't match fixed-width RAIDZ striping. We use PC-3000 Data Extractor RAID Edition to manually define the tabular matrix configuration matching the dRAID permutation, then reconstruct the distributed spare blocks when a sequential resilver fails due to cascaded hardware degradation across same-batch drives.

Persistent L2ARC Header Corruption

OpenZFS 2.0+ added persistent L2ARC, allowing cache tables to survive reboots by writing header metadata to the cache device. If the NVMe cache drive suffers a controller lockout or partial firmware corruption, the corrupted L2ARC header triggers a kernel panic or infinite hang during zpool import. The pool data is intact on the main vdevs; only the cache metadata is damaged. We bypass the panic by setting l2arc_rebuild_enabled=0 in the ZFS module parameters before import, which instructs OpenZFS to skip the persistent cache rebuild and mount the pool read-only for extraction.

What Causes ZFS Metadata Failures on Healthy Drives?

When all drives are physically healthy but the pool refuses to import, the failure is in ZFS metadata, not hardware. Common causes include corrupted uberblock rings, destroyed spacemaps, vdev GUID mismatches, and OpenZFS block cloning bugs. These failures require forensic analysis of the on-disk structures rather than clean bench work.

Corrupted Uberblock Ring

Each drive in the pool stores 128 uberblocks in a ring buffer across its four ZFS labels. If a power loss corrupts the active uberblock and the next several entries in the ring, ZFS cannot locate a valid root pointer to the Meta Object Set. We parse all 512 uberblock slots (128 per label × 4 labels) across every drive image using zdb -lu to find the highest transaction group with a valid checksum. If no valid uberblock exists on the primary drive, we cross-reference uberblocks from other pool members where the ring may be intact.

Destroyed Spacemap

Spacemaps log allocation and free events for each metaslab. Corruption typically occurs when a pool runs at 95%+ capacity and a write fails mid-commit. The pool refuses to mount because ZFS cannot verify which blocks are allocated. User data remains on disk at its original offsets. We import the pool read-only with spacemap validation bypassed and rebuild the allocation table from the block pointer tree, which is an independent metadata structure that records every live block's location.

Vdev GUID Mismatch

Every vdev and pool has a unique GUID recorded in the ZFS labels. If a drive is replaced and the resilver aborts partway through, the new drive carries a different GUID than the pool expects. OpenZFS refuses the import because the vdev tree no longer matches the topology stored in the uberblock. This is common on TrueNAS SCALE systems where a failed resilver leaves the replacement drive with an incomplete label. ZFS identifies drives by on-disk GUIDs, not by OS device names, so the mismatch is a metadata conflict. We reconstruct the correct topology from historical labels and reimport with the original GUIDs. Unlike hardware RAID recovery, where the controller stores configuration on a dedicated ROM chip, ZFS distributes its configuration across all member drives, so the topology can always be rebuilt from the surviving labels.

OpenZFS 2.2 Block Cloning Corruption

OpenZFS 2.2 introduced block cloning via the copy_file_range(2) system call (used by coreutils 9.x & cp --reflink). A concurrency bug in the DMU offset reporting causes chunks of cloned files to be silently replaced by zeroes. Standard zpool scrub won't catch it because the checksum matches the zero-filled block that ZFS committed. This affects TrueNAS SCALE, Proxmox, & any Linux system running OpenZFS 2.2.0 through 2.2.2. During forensic extraction of affected pools, we set zfs_dmu_offset_next_sync=0 to bypass the timing window and extract uncorrupted blocks from historical transaction groups where the original data predates the cloning operation. Recovery from this bug requires comparing block birth TXGs against the cloning timestamp to identify which files contain zeroed chunks.

How Does ZFS Recovery Differ Across TrueNAS, Proxmox, and Solaris?

ZFS runs on four major platforms. The on-disk format is cross-platform compatible (a pool created on Solaris can be imported on Linux), but encryption layers, feature flags, and bootloader integration differ across platforms. TrueNAS CORE uses GELI disk-level encryption; TrueNAS SCALE uses ZFS native dataset encryption; Proxmox stores VM disks as zvols; Solaris pools may use pre-feature-flag pool version 28.

TrueNAS / FreeNAS

TrueNAS CORE (FreeBSD) uses GELI disk-level encryption. TrueNAS SCALE (Debian Linux) uses ZFS native encryption at the dataset level. Both require key material for encrypted pools. See our dedicated TrueNAS recovery page for GELI-specific workflows. GELI keys are stored on the boot pool; if the boot drive is lost and no backup exists, the encrypted pool is unrecoverable.

QNAP QuTS hero

Newer enterprise QNAP NAS devices (TVS-x72XT, TVS-hx74, TS-x73A series) run the QuTS hero operating system, which replaces QTS's traditional ext4/mdadm stack with a full ZFS implementation. QNAP adds proprietary volume management wrappers and SSD caching layers on top of ZFS. We image the individual drives, bypass the QNAP hardware interface, and parse the QuTS hero ZFS pool offline using standard OpenZFS tooling. For other QNAP models running standard QTS with mdadm, see our NAS data recovery service.

Proxmox VE

Proxmox uses OpenZFS on Linux for VM storage. Pools typically store qcow2 disk images or zvols used as raw block devices by KVM/QEMU. Recovery involves importing the pool from images and extracting the guest VM disk files, then mounting the guest filesystem (NTFS, ext4, XFS) to verify the VM data. See Proxmox recovery for Ceph-related failures.

Oracle Solaris

Solaris is the original ZFS platform and may use older pool versions (pre-feature flags, pool version 28 or earlier). Older Solaris pools lack features like LZ4 compression and large dnode support. Recovery is straightforward if the pool version is identified correctly; we import on a matching OpenZFS version or use Solaris-native tools when feature flags are incompatible.

Linux OpenZFS

Ubuntu, Debian, Fedora, and Arch all support OpenZFS through the ZFS on Linux (ZoL) kernel module. Common in custom-built NAS servers and Proxmox hosts. Linux OpenZFS supports ZFS native encryption (dataset-level AES-256-GCM). Pool recovery is identical to other platforms once drives are imaged; the key difference is that Linux systems sometimes mix ZFS and mdadm (e.g., mdadm mirror for boot, ZFS pool for data), which requires handling both metadata formats.

Can VMware VMFS Datastores Be Recovered from ZFS iSCSI Zvols?

Yes. When a ZFS pool faults, ESXi hosts lose access to VMFS datastores exported via iSCSI zvols and all VMs go offline. Recovery requires reconstructing the ZFS pool from drive images, importing it read-only, extracting the raw zvol, and parsing the VMFS volume header to locate individual .vmdk flat extents.

TrueNAS Enterprise and custom ZFS servers frequently export zvols as iSCSI targets consumed by VMware ESXi hosts as VMFS datastores. ZFS treats each zvol as a raw block device and has no awareness of the VMFS structures or .vmdk files inside it.

When the underlying ZFS pool faults, the ESXi host loses access to the datastore and all VMs on it go offline. Recovery requires reconstructing the ZFS pool from drive images, importing it read-only, and extracting the raw zvol as a binary image.

We then parse the VMFS volume header from the zvol image to locate the file descriptor table and extract individual .vmdk flat extents without requiring the original ESXi hypervisor. Each extracted VM disk is mounted independently to verify guest filesystem integrity (NTFS, ext4, XFS). For ESXi-specific failure modes outside the ZFS layer, see our VMware ESXi recovery service.

On-Disk Format Differences Across ZFS Implementations

Pool version 28 is the last universally interoperable format across all ZFS implementations. Modern pools use feature flags instead of version numbers, and these flags diverge between OpenZFS on Linux, FreeBSD, and legacy Solaris. Importing a pool on an OpenZFS version that lacks a required feature flag returns a ZFS-8000-A5 error and refuses the import entirely.
Feature FlagZFS-on-LinuxFreeBSD ZFSSolaris (pre-OpenZFS)Recovery Impact
large_dnodeSupported (ZoL 0.7+)Supported (FreeBSD 12+)Not supportedPools with large_dnode cannot be imported on Solaris. Recovery environment must run OpenZFS 0.7+.
spacemap_v2Supported (ZoL 0.8+)Supported (FreeBSD 13+)Not supportedOlder ZoL versions (0.7.x) cannot read spacemap_v2 pools. Importing on a mismatched version produces a ZFS-8000-A5 error.
allocation_classesSupported (ZoL 0.8+)Supported (FreeBSD 13+)Not supportedPools using special allocation classes (metadata vdevs) require a recovery environment that supports this feature.
Native encryptionDataset-level AES-256-GCMGELI (disk-level) or nativeOracle proprietaryEncryption type determines key handling. GELI keys live on the boot pool; native encryption keys are per-dataset. Losing the key means the data is unrecoverable regardless of pool health.

During recovery, we match the import environment's OpenZFS version to the pool's feature flags. Attempting to import a pool on an older OpenZFS version that lacks a required feature flag produces a ZFS-8000-A5 error and refuses the import entirely. We maintain recovery workstations running multiple OpenZFS versions to handle pools from NAS enclosures and Proxmox hosts running different kernel versions.

How Do LSI HBA Firmware Crashes Destroy ZFS Labels?

If an LSI HBA firmware reverts from IT mode to IR mode during a power event, the controller writes DDF RAID metadata at the end of each attached drive, destroying ZFS labels L2 and L3. Recovery uses the surviving L0 and L1 labels at the beginning of each drive to reconstruct vdev topology.

ZFS requires SAS/SATA Host Bus Adapters flashed to IT (Initiator Target) mode for direct disk access. Broadcom/LSI controllers (9211-8i, 9300-8i, 9400-8i) are the standard in enterprise ZFS deployments.

Administrators can verify their firmware mode by running sas2flash -list (for SAS2 controllers) or sas3flash -list (for SAS3). If the output shows IR firmware where IT was expected, the labels at the end of each drive have been overwritten by DDF metadata.

We recover these pools by calculating the exact byte offset of the IR-mode metadata overlay and parsing the surviving L0 and L1 labels at the beginning of each drive to reconstruct the vdev topology. Because ZFS stores redundant label copies at both ends of every disk, a single-end overwrite is recoverable if the drives are not subsequently reformatted. For pools where the HBA also managed a hardware RAID array, the recovery becomes a dual-layer operation: reconstruct the hardware RAID geometry first, then parse ZFS structures from the reconstructed logical volume.

What Should You Do When a ZFS Pool Import Fails?

If your ZFS pool won't import, check kernel I/O errors with dmesg, attempt a standard import without the -f flag, verify ZFS labels on each drive with zdb -l, and power down immediately if mechanical failure is suspected. Each step is diagnostic only and does not write to the pool.
  1. Check kernel I/O errors. Run dmesg | grep -i "error\|fault\|reset" to identify which drives reported hardware-level errors before ZFS faulted the pool. SCSI sense codes or ATA timeout messages point to the specific failing drive.
  2. Attempt a standard import without the -f flag. Run zpool import (no arguments) to list all visible pools and their state. If the pool appears as UNAVAIL or DEGRADED, note which vdevs are missing. Do not use -f at this stage.
  3. Verify ZFS labels on each drive. Run zdb -l /dev/sdX on each pool member to confirm label presence and read the vdev GUID, pool GUID, and highest transaction group number. Drives missing all four labels were either wiped or belong to a different pool.
  4. Power down if mechanical failure is suspected. Clicking, grinding, or repeated spin-up/spin-down cycles indicate physical drive failure requiring clean bench work. Continued operation risks platter scoring. Do not attempt to offline/online the drive or force a resilver. Ship the drives to a no-fix-no-fee recovery lab for imaging under controlled conditions.

Which ZFS Commands Destroy Recovery Options?

The following commands, commonly suggested in forum posts, will reduce or eliminate recovery chances if run on a failing pool: zpool import -f, zpool clear followed by resilver, and the zfs_max_missing_tvds tunable. These commands permanently overwrite the historical transaction groups required for offline forensic reconstruction. Power down the system instead.

zpool import -f on a faulted pool

Forces import of a pool that ZFS has refused. This writes new TXGs to the pool, overwriting the metadata ZFS needs for self-consistency checks. If the pool is faulted due to drive failures, the forced import will record that the failed drives are absent, and any subsequent export-reimport cycle will reference the damaged state rather than the pre-failure state.

How Forced Imports Destroy the Uberblock Ring

ZFS uses a copy-on-write model: every zpool import -f allocates new objset_phys_t metadata trees and commits a new Transaction Group (TXG) to every surviving drive. The 128-entry uberblock ring is a circular buffer; each new TXG overwrites the oldest entry. A forced import on a degraded pool permanently records the missing drives as absent in the new TXG baseline.

Once that TXG is written, rolling back to the pre-failure topology becomes impossible because the uberblock entry that referenced the original vdev tree has been overwritten. If the forced import triggers additional write activity (scrub commands, dataset mounts, ZIL replays), multiple uberblock entries are consumed in rapid succession, shrinking the recovery window from 128 TXGs to as few as a handful. This is why we image every drive before attempting any import variant.

zpool clear followed by resilver on a degraded pool

Clearing errors and resilvering writes parity data across all surviving drives. If any surviving drive has developing bad sectors (common with same-batch drives of the same age), the resilver can trigger that drive to fail, pushing the pool past its parity tolerance. We see this cascade failure regularly.

zfs_max_missing_tvds tunable

This kernel tunable allows ZFS to import a pool with missing top-level vdevs. Setting it to a non-zero value and importing writes new TXGs that permanently record the missing vdevs as absent. If you then add the missing vdevs back, ZFS treats them as foreign devices and will not reattach them. The original pool topology is overwritten. This tunable is a last-resort forensic tool, not a recovery shortcut.

How Does ZFS Pool Recovery Work?

ZFS pool recovery follows four stages: imaging every drive through PC-3000 with write-blocking, analyzing all four vdev labels on each image to reconstruct pool topology, selecting the highest valid uberblock for TXG rollback, and importing the pool read-only from drive images to extract datasets, zvols, and snapshots.

1. Drive Imaging with PC-3000

Every drive in the pool is imaged through PC-3000 with write-blocking. SAS drives (common in TrueNAS Enterprise and Solaris servers) are imaged via SAS HBAs in IT mode. For drives with bad sectors, we capture healthy regions first using sector maps, then retry damaged areas with aggressive read parameters. Drives with mechanical failures (clicking, motor seizure) receive clean bench work before imaging: head swaps, motor transplants, or platter stabilization, all performed under 0.02 µm ULPA filtration.

2. Vdev Label Analysis

We read all four ZFS labels from each drive image using zdb -l. The labels contain the pool name, pool GUID, vdev GUID, vdev tree (encoded as an nvlist), and the 128-entry uberblock ring. By comparing labels across all drive images, we reconstruct the complete vdev topology even when some drives have corrupted labels. The vdev tree tells us which drives belong to which vdev, whether each vdev is a mirror or RAIDZ, and the ashift (sector size alignment, typically 9 for 512-byte or 12 for 4K-native drives).

3. Uberblock Selection and TXG Rollback

The uberblock ring on each drive contains the last 128 transaction groups. We examine each uberblock using zdb -lu to find the highest TXG with a valid checksum. If the latest TXG is corrupted, we roll back to an earlier state using zpool import -T [txg]. The data loss from TXG rollback is limited to writes that occurred between the target TXG and the failed TXG. For most failures triggered by drive loss rather than active corruption, the rollback window is seconds.

4. Offline Pool Import and Dataset Extraction

The pool is imported read-only from the drive images using loopback devices on a dedicated recovery workstation. We verify the pool status, check for data errors using zpool status -v, and extract individual datasets, zvols, and snapshots. For zvols used as VM storage (Proxmox, bhyve), we mount the guest filesystem to verify the VM data is intact. Snapshots are preserved; if the live dataset has corruption but a recent snapshot is clean, we recover from the snapshot.

How Much Does ZFS Data Recovery Cost?

ZFS recovery is priced per-drive based on each drive's condition, plus a $400-$800 pool reconstruction fee covering vdev analysis, pool import, and dataset extraction. Per-drive imaging runs $250-$900 for firmware or logical issues, higher for drives requiring clean bench head swaps.

Same transparent model as our RAID recovery pricing: per-drive imaging fees depend on whether each drive needs firmware repair or clean bench head swaps. The pool reconstruction fee covers vdev reconstruction, uberblock analysis, and dataset extraction.

Service TierPrice Range (Per Drive)Description
Logical / Firmware Imaging$250-$900Firmware module damage, SMART threshold failures, or filesystem corruption on individual pool members.
Mechanical (Head Swap / Motor)$1,200-$1,50050% depositDonor parts consumed during transplant. SAS drives (common in enterprise ZFS servers) require SAS-specific donors.
ZFS Pool Reconstruction$400-$800per poolVdev reconstruction, uberblock analysis, pool import, and dataset/zvol extraction. Includes ZFS native decryption or GELI decryption if key material is provided.

No Data = No Charge: If we recover nothing from your ZFS pool, you owe $0. Free evaluation, no obligation.

Before sending drives: export your encryption key (GELI recovery key for TrueNAS CORE, ZFS encryption passphrase for TrueNAS SCALE or Linux). Note the pool name and vdev layout from zpool status if the system still boots.

ZFS Recovery for Enterprise IT: RTO, RPO, and Escalation

Mirrored pool recovery with an aborted resilver runs 3 to 5 business days. A faulted RAIDZ2 pool with two mechanical failures runs 5 to 10 business days. A dRAID pool with failures beyond parity tolerance runs 10 to 20 business days. RPO for all topologies is the last transaction group committed before the failure event.

Most ZFS pools we intake belong to IT teams with a documented recovery time objective and recovery point objective written into a DR runbook. The numbers below come from real pool-reconstruction workloads on the RAIDZ, mirror, and dRAID topologies we see most often. They assume donor drives are available at intake; if donors must be sourced from auction stock the mechanical tiers add 2 to 5 business days on top.

Mirrored pool, one failed member, hot spare resilver aborted

Realistic RTO: 3 to 5 business days. Realistic RPO: the last successful TXG commit before the resilver abort, typically within 5 to 30 seconds of the event. Path: image both mirror halves plus the spare, reconcile vdev labels, walk the uberblock ring on the image with the most consistent label, mount read-only, extract. No parity reconstruction required.

RAIDZ2, two mechanical failures, no spare

Realistic RTO: 5 to 10 business days. Realistic RPO: last TXG committed to the pool before the second drive failed. Head swaps or PCB work for the two failed drives land on days 2 through 4, stripe reconstruction through PC-3000 Data Extractor RAID Edition on days 5 through 7, dataset extraction and verification on days 8 through 10.

TrueNAS SCALE pool, SLOG lost during sync writes

Realistic RTO: 2 to 4 business days. Realistic RPO: SLOG-held LWBs are typically lost; the pool rolls back to the last on-pool TXG committed before the SLOG died. For databases or NFS datastores running synchronous writes, the RPO window is the sync interval (commonly 5 seconds). Enterprise server recovery is billed at the per-drive imaging tier plus the pool reconstruction fee.

dRAID pool, multiple concurrent failures beyond parity

Realistic RTO: 10 to 20 business days. dRAID distributes parity across a larger pool of drives, so partial reconstruction without all members requires forensic stripe walking across dozens of images. This is the longest path in our ZFS workload and not every pool is recoverable; a written assessment precedes any billable reconstruction work.

Turnaround and rush handling

Imaging starts within 24 hours of drive receipt at the Austin lab. A +$100 rush fee to move to the front of the queue moves the pool to the front of the imaging queue and typically compresses the imaging phase by 2 to 3 business days. It does not shorten head-swap or PCB repair windows, which are dictated by donor availability. Rush fee disclosure and donor cost guidance live on our no-data-no-fee page; nothing is billed unless datasets extract and verify.

Single technical contact and escalation path

One technician owns the case from intake to release. The named contact on your side receives daily status updates during active imaging or reconstruction and an immediate call if a drive deteriorates on the rig. There is no account-manager handoff and no sales tier between your engineer and ours. Escalation for production-down pools goes directly to the bench technician working the case; there is no on-call queue or tiered-support layer to route around.

Chain of Custody, NDA, and Evidence Handling

ZFS pools we receive from managed service providers, healthcare IT teams, law firms, and research labs travel with paperwork. The controls below describe what we track on every case, not a premium tier. They apply equally to a 4-drive TrueNAS mini and a 36-drive enterprise JBOD.

Mutual NDA on request

We sign mutual non-disclosure agreements before intake for clients who require one. Counterparty templates are accepted and redlined in one business day. The executed NDA binds every technician, packager, and administrator who touches the case file. We do not publish case studies, screenshots, or file listings from NDA work under any circumstances.

Per-drive chain-of-custody log

Every drive is logged at intake with serial number, model, arrival timestamp, shipping condition, and the technician who accepts the package. Each imaging, repair, and storage event is appended to the same log with technician initials and timestamp. The log is released with the recovery report on case close or on demand for carrier-side audit.

Image hashes for forensic use

Full-drive images produced on PC-3000 Portable III and PC-3000 Express are checksummed (SHA-256) at creation. Hashes are written into the case file so a downstream forensic examiner can verify the image they receive matches the image we extracted. Source drives are never written to; every repair and reconstruction step happens on the clones.

Secure destruction and return

On case close, the client chooses: original drives and extracted data shipped back, extracted data only (drives destroyed in-lab with a signed destruction certificate), or drives held in a locked evidence cabinet for a defined retention window. Destruction is physical (shred or degauss for magnetic media, controlled board destruction for SSDs); no soft-wipe substitute.

Working alongside IR firms and carriers

For ransomware, insider-deletion, and hardware-failure claims, a carrier-appointed incident-response firm often drives the investigation. We provide a fixed-quote scope of work up front so the carrier can authorize the imaging line item before billable work starts. Chain-of-custody artifacts and image hashes are released to the IR firm on request so their forensic timeline can be independently verified against ours. Related workflows on NAS recovery and single-drive recovery follow the same chain-of-custody rules.

Scope boundaries we state in writing

We do not perform incident response, malware analysis, or eDiscovery review. We produce clean images and extracted datasets; the forensic report, legal hold letter, and regulator notification are the client's or the IR firm's responsibility. Stating this up front keeps the quote accurate and keeps the case from sprawling into scope we are not the right lab for.

Lab Location and Mail-In

All ZFS recovery work is performed in-house at our Austin lab: 2410 San Antonio Street, Austin, TX 78705. Walk-in evaluations are available Monday - Friday, 10 AM - 6 PM CT. For clients outside Austin, we accept mail-in shipments from all 50 states. Ship drives in anti-static bags with foam padding. Label each drive with its slot number from the original system if possible.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to make sure your hard drive is handled safely and properly. This approach allows us to serve clients nationwide with consistent technical standards.

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video

ZFS Recovery; Common Questions

My ZFS pool shows FAULTED and zpool import fails. Can you recover the data?
Yes. A FAULTED pool means ZFS cannot guarantee consistency with the remaining vdevs. We image all drives including the failed ones, reconstruct vdev geometry from ZFS label data at sectors 0 and end-of-disk on each member, and force-import the pool from images to extract datasets.
I ran zpool import -f and it made things worse. Is recovery still possible?
Usually yes. A forced import writes new transaction groups to the pool, which can overwrite metadata ZFS needs for self-repair. The severity depends on how much write activity occurred after the forced import. We image the drives in their current state and attempt recovery from historical transaction groups that predate the forced import.
Can you recover a ZFS pool after zpool destroy or zpool labelclear?
If no new data has been written to the drives after the destroy command, the uberblocks and metadata trees are still on disk. We scan for historical uberblocks at known offsets and reconstruct the pool from the most recent valid transaction group.
Does it matter if I use RAIDZ1, RAIDZ2, or RAIDZ3?
The RAIDZ level determines how many drives can fail before the pool faults. RAIDZ1 tolerates one failure per vdev, RAIDZ2 tolerates two, RAIDZ3 tolerates three. Recovery complexity increases when failures exceed these thresholds because we must reconstruct data without parity assistance for the extra failed drives.
My ZFS pool uses deduplication and the DDT is corrupted. Can you recover files?
DDT corruption is one of the harder ZFS recovery scenarios. The dedup table maps block references to their physical locations on disk. If the DDT is damaged, files that reference deduplicated blocks cannot be resolved through normal import. We reconstruct the DDT from the block pointer tree by scanning every dnode in the pool and rebuilding the reference map.
How is ZFS recovery priced?
Per-drive imaging based on each drive's condition ($300-$900 per drive), plus a $400-$800 pool reconstruction fee covering vdev analysis, pool import, and dataset extraction. If we recover nothing, you pay $0.
Why does TrueNAS SCALE refuse to import my pool after a drive replacement?
TrueNAS SCALE pools transition to UNAVAIL if a resilver aborts mid-transaction group or if a replacement drive's ZFS label is overwritten, causing a vdev GUID mismatch. ZFS identifies drives by on-disk GUIDs in the vdev labels, not by OS device names, so the mismatch is a metadata conflict, not a Linux vs. FreeBSD enumeration issue. Recovery requires imaging all drives, locating historical vdev labels that predate the mismatch, and reconstructing the pool offline.
Can you recover a TrueNAS ZFS pool built on top of a Dell PERC hardware RAID?
Yes, but it is a dual-layer recovery. ZFS on top of hardware RAID (Dell PERC, HP SmartArray, LSI MegaRAID) prevents ZFS from seeing individual disks, disabling self-healing and SMART monitoring. When the hardware RAID degrades and causes a kernel panic, we first reconstruct the hardware RAID block geometry using PC-3000 to virtualize the logical unit, then parse the ZFS uberblocks and datasets from that reconstructed volume. This is more complex than standard ZFS recovery because the stripe width, parity layout, and block alignment of the hardware RAID must be resolved before ZFS metadata becomes readable.
Can data be recovered if the ZFS spacemap is corrupted?
Yes. Spacemaps track allocated and free blocks for each metaslab. Corruption prevents normal pool mounting because ZFS cannot determine which blocks are in use, but user data blocks remain intact on disk. We bypass the spacemap check during a read-only import and extract datasets directly from the block pointer tree.
Is it safe to run zpool import -F on a degraded or unmountable pool?
The -F (rewind) flag forces ZFS to discard the most recent transaction groups until it finds a consistent state. This irreversibly destroys the most recent writes (typically 5 to 30 seconds of data, depending on the TXG sync interval and I/O load), and the discarded TXGs cannot be recovered. We image all drives first, then use read-only TXG rollback (zpool import -T) on cloned images rather than executing destructive rewind commands on live hardware. If you have already run -F, recovery from pre-rewind TXGs depends on whether the rewind overwrote the uberblock ring entries we need.
Why does ZFS data recovery cost more than standard single-drive logical recovery?
ZFS recovery is a multi-stage process. A degraded RAIDZ2 pool on 8 SAS drives requires imaging every individual drive through PC-3000 with SAS HBAs, then mathematically reconstructing variable-width RAIDZ stripes across those images before user data can be extracted. The $400-$800 pool reconstruction fee covers vdev analysis, uberblock selection, and dataset extraction. Per-drive imaging fees ($250-$900 each) depend on whether each drive needs firmware repair or clean bench head swaps. A single NTFS drive needs one image and one filesystem scan; a ZFS pool needs N images plus stripe reconstruction.
What does it mean when zpool import hangs instead of returning an error?
If zpool import fails instantly with an I/O error or UNAVAIL status, ZFS read the vdev labels but determined the pool lacks enough drives to meet the RAIDZ parity threshold. If the command hangs indefinitely, the Linux kernel is retrying failed read operations on a drive with degraded read/write heads or severe bad sectors. A hanging import is a physical warning: the drive is still powered on and the heads are scraping the platter surface with each retry. Power down immediately. Don't wait for the command to timeout. Ship the drives to a lab for imaging under controlled conditions.
How do you recover a pool if the ZIL (SLOG) drive fails during a synchronous write?
If a dedicated SLOG drive dies while holding uncommitted Log-Write Blocks, attempting zpool import often causes a kernel panic or a blocked task error ("task z_wr_iss blocked for more than 122 seconds"). We don't run aggressive filesystem checks on the pool drives. Instead, we set zil_replay_disable=1 in the ZFS module parameters (/sys/module/zfs/parameters/zil_replay_disable) before import. This tells OpenZFS to discard the intent log rather than replaying it, sacrificing the last few seconds of synchronous writes but allowing the rest of the pool to mount safely for extraction.
Can you recover data if a TrueNAS CORE to SCALE migration destroys the pool?
Yes. Migrations from TrueNAS CORE (FreeBSD) to SCALE (Debian Linux) fail when GELI disk-level encryption wrappers aren't decrypted before the OS transition. The OpenZFS Linux module can't unpack FreeBSD GELI-encrypted vdev labels, so the pool appears empty or refuses import entirely. We reconstruct a FreeBSD recovery environment, supply the geli.key from the original CORE boot pool, decrypt the vdev labels, and extract datasets to a staging server. If the boot pool is also lost, recovery depends on whether a backup of the GELI key file exists.
Can you sign an NDA before we ship the drives?
Yes. We execute mutual NDAs before intake on request. Standard turnaround on an NDA review is one business day; counterparty templates are accepted and redlined where necessary. On receipt, each drive is logged into a chain-of-custody record with serial number, arrival timestamp, and the technician who accepts the package. The signed NDA, the chain-of-custody log, and the recovery report stay with the case file and are released to the named contacts on your side only.
What is a realistic RTO for a faulted RAIDZ2 pool with two failed drives?
A faulted RAIDZ2 pool with two mechanically failed drives has a realistic recovery time objective of 5 to 10 business days from intake to a read-only dataset mount, assuming donor drives are on the shelf. Day 1 is full-pool imaging on PC-3000 Portable III and PC-3000 Express rigs (every member drive cloned before any read is issued to the degraded pair). Days 2 to 4 cover head swaps or PCB repair on the failed drives in the clean bench. Days 5 to 7 are vdev label reconciliation, uberblock ring selection, and RAIDZ2 stripe reconstruction from the images. Days 8 to 10 are dataset extraction and verification. A +$100 rush fee to move to the front of the queue moves the case to the front of the imaging queue and typically compresses the schedule by 2 to 3 business days; it does not shorten the physical-repair or reconstruction phases.
Can you coordinate directly with our cyber insurance carrier or incident-response firm?
Yes. We routinely work alongside carrier-appointed IR firms on ransomware, insider-deletion, and hardware-failure claims. We provide a written scope of work and fixed quote for the carrier adjuster before any billable imaging work starts, so the claim package has documentation the adjuster can approve against. Chain-of-custody artifacts (intake log, per-drive imaging hashes, technician sign-off) are released to the IR firm on request for their forensic report. We do not overwrite the source drives, and the original images are preserved for the duration of the claim so the forensic timeline can be independently audited.

Ready to recover your ZFS pool?

Free evaluation. No data = no charge. Mail-in from anywhere in the U.S.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews