Skip to main contentSkip to navigation
Rossmann Repair Group logo - data recovery and MacBook repair

Pool Recovery & RAIDZ Reconstruction

ZFS Data Recovery

We recover faulted ZFS pools by imaging every drive, parsing vdev labels and uberblocks, and importing the pool offline from cloned images. Covers TrueNAS, FreeNAS, Proxmox, Oracle Solaris, and Linux OpenZFS. Free evaluation. No data = no charge.

Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated March 2026
16 min read

How ZFS Pools Fail and How We Reconstruct Them

ZFS stores all data and metadata in a Merkle tree rooted at the uberblock. Each pool contains one or more vdevs (mirror, RAIDZ1, RAIDZ2, or RAIDZ3), and each vdev distributes data across its member drives using variable-width stripes. When enough drives in a vdev fail that ZFS cannot reconstruct missing blocks from parity, the pool transitions to FAULTED and refuses import. Recovery requires imaging every drive in the pool, locating valid uberblocks in the 128-entry ring on each drive's ZFS labels, and force-importing the pool read-only from images to extract datasets and zvols.

ZFS checksums every block using SHA-256 (or fletcher4 for non-dedup pools). This per-block verification catches silent corruption thattraditional RAID controllers miss. The trade-off: when ZFS detects a checksum mismatch and cannot correct it from parity, it returns an I/O error instead of serving corrupt data. Scrub errors on a degraded pool signal that more blocks will become inaccessible if another drive fails.

ZFS On-Disk Architecture for Recovery Engineers

Targeted ZFS recovery requires understanding five on-disk structures that standard file-carving tools do not handle. Each structure has a specific role in the pool's self-describing metadata tree, and corruption at any level produces different symptoms.

Vdev Labels (L0, L1, L2, L3)

Every drive in a ZFS pool carries four label copies: L0 and L1 at the first 512 KB, L2 and L3 at the last 512 KB. Each label contains the pool GUID, vdev GUID, the full vdev tree encoded as an nvlist, and the uberblock ring (128 entries, 1 KB each). ZFS writes labels in a specific order to prevent all four from being corrupted by a single interrupted write. During recovery, we read all four labels from every drive image to find the most complete vdev tree and the highest-txg uberblock.

Uberblock Ring and Transaction Groups

The uberblock is the root pointer for the entire pool. ZFS maintains a ring buffer of 128 uberblocks per label, written round-robin as each transaction group (TXG) commits. Each uberblock records the TXG number, a timestamp, and a block pointer to the Meta Object Set (MOS). The active uberblock is the one with the highest TXG that also has a valid checksum. When the latest TXG is corrupted, we walk backward through the ring to find an older, consistent state. The trade-off: rolling back to an earlier TXG means any data written after that TXG is lost. For a pool that saw 10 TXGs of write activity before the failure, that window is usually seconds to minutes of data.

DVA Pointers and the Block Pointer Tree

Every ZFS block pointer contains up to three Data Virtual Addresses (DVAs). Each DVA encodes the vdev ID, offset within the vdev, and the gang bit (indicating whether the block is a gang block split across multiple sub-blocks). The block pointer also stores the checksum of the target block, the compression algorithm, the logical and physical sizes, and the birth TXG. We use zdb -bbb on the imported pool image to traverse the full block pointer tree. When DVA pointers reference sectors on a failed drive, we reconstruct the missing data from RAIDZ parity (if within tolerance) or flag those blocks as unrecoverable.

Dnode Objects and the Meta Object Set

The MOS is the top-level object set containing all pool-wide metadata: the dataset directory, the space map, the DDT (if dedup is enabled), and configuration objects. Each dataset within the pool has its own object set, and within that set, every file or directory is represented by a dnode. A dnode is a 512-byte structure that stores the object type, bonus data (such as the ZPL file attributes), and up to three block pointers for the object's data. When MOS corruption prevents normal import, we locate the MOS block pointer directly from the uberblock and traverse the dnode tree manually using zdb -dddd to enumerate datasets.

Space Maps and Free Space Tracking

ZFS tracks allocated and free space using space maps: on-disk logs of allocation and free events for each metaslab. Corrupted space maps do not lose user data, but they prevent ZFS from mounting the pool because it cannot determine which blocks are in use. Recovery involves bypassing the space map check during read-only import and rebuilding the space map from the block pointer tree. This is a metadata-only repair that does not modify user data blocks.

RAIDZ Parity Distribution and Rebuild Constraints

RAIDZ differs from traditional RAID 5/6 in a fundamental way: it uses variable-width stripes. Each logical block is distributed across a stripe whose width depends on the block's size and the number of drives in the vdev. This eliminates the RAID write hole without requiring a battery-backed cache, but it means recovery tools designed for fixed-stripe RAID arrays cannot parse RAIDZ data.

RAIDZ LevelParity DrivesFault ToleranceRecovery When Exceeded
RAIDZ11 per stripe1 drive per vdevBlocks on 2+ failed drives are unrecoverable from parity alone. Partial recovery based on which blocks landed on which drives.
RAIDZ22 per stripe2 drives per vdevBlocks spanning 3+ failed drives lost. RAIDZ2 is the most common production configuration and offers a better recovery margin than RAIDZ1.
RAIDZ33 per stripe3 drives per vdevRarely exceeds tolerance in practice. Typically seen in large vdevs (8+ drives) where the probability of three simultaneous failures is non-trivial during resilver.

For traditional hardware RAID arrays (Dell PERC, HP SmartArray, LSI MegaRAID), see our RAID data recovery service. RAIDZ is software RAID managed by ZFS and uses a different on-disk layout than controller-based arrays.

ZFS Pool Failure Scenarios We Recover

Faulted Pool After Drive Failures

The most common scenario. RAIDZ1 pools fault when two drives in the same vdev fail. RAIDZ2 pools fault on three failures. We image all drives (including failed ones) and reconstruct what parity can provide. Board-level repair on electrically failed drives can restore one member and bring the pool back within tolerance.

Failed Resilver

Resilvering writes parity data to all surviving members. If a surviving drive develops bad sectors during the resilver, ZFS cannot complete the rebuild. A mid-resilver failure is dangerous because parity is partially recalculated: some stripes reflect the old layout, others reflect the new. We handle this by imaging all drives and reconciling both parity states offline.

Corrupted Uberblock or MOS

Power loss during a TXG commit can corrupt the active uberblock or the MOS it points to. ZFS normally recovers by falling back to a previous TXG, but if multiple TXGs are affected (e.g., UPS failure during a long scrub), manual uberblock selection is needed. We use zpool import -T to target a specific TXG, or parse the uberblock ring manually with zdb -lu to find the last consistent state.

ZIL / SLOG Device Failure

The ZFS Intent Log (ZIL) records synchronous write transactions. If a dedicated SLOG device (typically a fast NVMe SSD) fails, any uncommitted synchronous writes are lost. For pools where the SLOG failed during active database writes or NFS/iSCSI operations, we image both the SLOG device and the pool drives. If the SLOG contains recoverable log records, we replay them into the pool. If the SLOG is physically dead, the pool imports without those pending writes.

Accidental Pool Destruction

Running zpool destroy wipes labels from each drive. Running zpool labelclear does the same to individual drives. If no new data has been written afterward, the uberblocks and block pointer tree remain on disk at their original offsets. We scan for the characteristic uberblock magic number (0x00bab10c) and pool GUID to locate and reconstruct from them.

Dedup Table Corruption

ZFS deduplication stores a Dedup Table (DDT) that maps block checksums to their physical locations. The DDT consumes RAM (about 320 bytes per entry) and is backed by on-disk log-structured storage. When the DDT becomes corrupted (common when pools run low on memory under heavy dedup loads), files referencing deduplicated blocks cannot be resolved. We rebuild the DDT by scanning the full block pointer tree and reconstructing the checksum-to-DVA mappings.

Platform-Specific ZFS Recovery Notes

ZFS runs on four major platforms. The on-disk format is cross-platform compatible (a pool created on Solaris can be imported on Linux), but encryption layers, feature flags, and bootloader integration differ across platforms.

TrueNAS / FreeNAS

TrueNAS CORE (FreeBSD) uses GELI disk-level encryption. TrueNAS SCALE (Debian Linux) uses ZFS native encryption at the dataset level. Both require key material for encrypted pools. See our dedicated TrueNAS recovery page for GELI-specific workflows. GELI keys are stored on the boot pool; if the boot drive is lost and no backup exists, the encrypted pool is unrecoverable.

Proxmox VE

Proxmox uses OpenZFS on Linux for VM storage. Pools typically store qcow2 disk images or zvols used as raw block devices by KVM/QEMU. Recovery involves importing the pool from images and extracting the guest VM disk files, then mounting the guest filesystem (NTFS, ext4, XFS) to verify the VM data. See Proxmox recovery for Ceph-related failures.

Oracle Solaris

Solaris is the original ZFS platform and may use older pool versions (pre-feature flags, pool version 28 or earlier). Older Solaris pools lack features like LZ4 compression and large dnode support. Recovery is straightforward if the pool version is identified correctly; we import on a matching OpenZFS version or use Solaris-native tools when feature flags are incompatible.

Linux OpenZFS

Ubuntu, Debian, Fedora, and Arch all support OpenZFS through the ZFS on Linux (ZoL) kernel module. Common in custom-built NAS servers and Proxmox hosts. Linux OpenZFS supports ZFS native encryption (dataset-level AES-256-GCM). Pool recovery is identical to other platforms once drives are imaged; the key difference is that Linux systems sometimes mix ZFS and mdadm (e.g., mdadm mirror for boot, ZFS pool for data), which requires handling both metadata formats.

Commands That Destroy ZFS Recovery Options

The following commands, commonly suggested in forum posts, will reduce or eliminate recovery chances if run on a failing pool. If your pool is degraded or faulted, do not run these. Power down the system and contact us.

zpool import -f on a faulted pool

Forces import of a pool that ZFS has refused. This writes new TXGs to the pool, overwriting the metadata ZFS needs for self-consistency checks. If the pool is faulted due to drive failures, the forced import will record that the failed drives are absent, and any subsequent export-reimport cycle will reference the damaged state rather than the pre-failure state.

zpool clear followed by resilver on a degraded pool

Clearing errors and resilvering writes parity data across all surviving drives. If any surviving drive has developing bad sectors (common with same-batch drives of the same age), the resilver can trigger that drive to fail, pushing the pool past its parity tolerance. We see this cascade failure regularly.

zfs_max_missing_tvds tunable

This kernel tunable allows ZFS to import a pool with missing top-level vdevs. Setting it to a non-zero value and importing writes new TXGs that permanently record the missing vdevs as absent. If you then add the missing vdevs back, ZFS treats them as foreign devices and will not reattach them. The original pool topology is overwritten. This tunable is a last-resort forensic tool, not a recovery shortcut.

Our ZFS Recovery Methodology

1. Drive Imaging with PC-3000

Every drive in the pool is imaged through PC-3000 with write-blocking. SAS drives (common in TrueNAS Enterprise and Solaris servers) are imaged via SAS HBAs in IT mode. For drives with bad sectors, we capture healthy regions first using sector maps, then retry damaged areas with aggressive read parameters. Drives with mechanical failures (clicking, motor seizure) receive clean bench work before imaging: head swaps, motor transplants, or platter stabilization, all performed under 0.02 µm ULPA filtration.

2. Vdev Label Analysis

We read all four ZFS labels from each drive image using zdb -l. The labels contain the pool name, pool GUID, vdev GUID, vdev tree (encoded as an nvlist), and the 128-entry uberblock ring. By comparing labels across all drive images, we reconstruct the complete vdev topology even when some drives have corrupted labels. The vdev tree tells us which drives belong to which vdev, whether each vdev is a mirror or RAIDZ, and the ashift (sector size alignment, typically 9 for 512-byte or 12 for 4K-native drives).

3. Uberblock Selection and TXG Rollback

The uberblock ring on each drive contains the last 128 transaction groups. We examine each uberblock using zdb -lu to find the highest TXG with a valid checksum. If the latest TXG is corrupted, we roll back to an earlier state using zpool import -T [txg]. The data loss from TXG rollback is limited to writes that occurred between the target TXG and the failed TXG. For most failures triggered by drive loss rather than active corruption, the rollback window is seconds.

4. Offline Pool Import and Dataset Extraction

The pool is imported read-only from the drive images using loopback devices on a dedicated recovery workstation. We verify the pool status, check for data errors using zpool status -v, and extract individual datasets, zvols, and snapshots. For zvols used as VM storage (Proxmox, bhyve), we mount the guest filesystem to verify the VM data is intact. Snapshots are preserved; if the live dataset has corruption but a recent snapshot is clean, we recover from the snapshot.

ZFS Recovery Pricing

Same transparent model as our RAID recovery pricing: per-drive imaging based on each drive's condition, plus a $400-$800 pool reconstruction fee covering vdev analysis, pool import, and dataset extraction. No data recovered = no charge.

Service TierPrice Range (Per Drive)Description
Logical / Firmware Imaging$250-$900Firmware module damage, SMART threshold failures, or filesystem corruption on individual pool members.
Mechanical (Head Swap / Motor)$1,200-$1,50050% depositDonor parts consumed during transplant. SAS drives (common in enterprise ZFS servers) require SAS-specific donors.
ZFS Pool Reconstruction$400-$800per poolVdev reconstruction, uberblock analysis, pool import, and dataset/zvol extraction. Includes ZFS native decryption or GELI decryption if key material is provided.

No Data = No Charge: If we recover nothing from your ZFS pool, you owe $0. Free evaluation, no obligation.

Before sending drives: export your encryption key (GELI recovery key for TrueNAS CORE, ZFS encryption passphrase for TrueNAS SCALE or Linux). Note the pool name and vdev layout from zpool status if the system still boots.

Lab Location and Mail-In

All ZFS recovery work is performed in-house at our Austin lab: 2410 San Antonio Street, Austin, TX 78705. Walk-in evaluations are available Monday - Friday, 10 AM - 6 PM CT. For clients outside Austin, we accept mail-in shipments from all 50 states. Ship drives in anti-static bags with foam padding. Label each drive with its slot number from the original system if possible.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to make sure your hard drive is handled safely and properly. This approach allows us to serve clients nationwide with consistent technical standards.

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

LR

Louis Rossmann

Louis Rossmann's well trained staff review our lab protocols to ensure technical accuracy and honest service. Since 2008, his focus has been on clear technical communication and accurate diagnostics rather than sales-driven explanations.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video

ZFS Recovery; Common Questions

My ZFS pool shows FAULTED and zpool import fails. Can you recover the data?
Yes. A FAULTED pool means ZFS cannot guarantee consistency with the remaining vdevs. We image all drives including the failed ones, reconstruct vdev geometry from ZFS label data at sectors 0 and end-of-disk on each member, and force-import the pool from images to extract datasets.
I ran zpool import -f and it made things worse. Is recovery still possible?
Usually yes. A forced import writes new transaction groups to the pool, which can overwrite metadata ZFS needs for self-repair. The severity depends on how much write activity occurred after the forced import. We image the drives in their current state and attempt recovery from historical transaction groups that predate the forced import.
Can you recover a ZFS pool after zpool destroy or zpool labelclear?
If no new data has been written to the drives after the destroy command, the uberblocks and metadata trees are still on disk. We scan for historical uberblocks at known offsets and reconstruct the pool from the most recent valid transaction group.
Does it matter if I use RAIDZ1, RAIDZ2, or RAIDZ3?
The RAIDZ level determines how many drives can fail before the pool faults. RAIDZ1 tolerates one failure per vdev, RAIDZ2 tolerates two, RAIDZ3 tolerates three. Recovery complexity increases when failures exceed these thresholds because we must reconstruct data without parity assistance for the extra failed drives.
My ZFS pool uses deduplication and the DDT is corrupted. Can you recover files?
DDT corruption is one of the harder ZFS recovery scenarios. The dedup table maps block references to their physical locations on disk. If the DDT is damaged, files that reference deduplicated blocks cannot be resolved through normal import. We reconstruct the DDT from the block pointer tree by scanning every dnode in the pool and rebuilding the reference map.
How is ZFS recovery priced?
Per-drive imaging based on each drive's condition ($300-$900 per drive), plus a $400-$800 pool reconstruction fee covering vdev analysis, pool import, and dataset extraction. If we recover nothing, you pay $0.

Ready to recover your ZFS pool?

Free evaluation. No data = no charge. Mail-in from anywhere in the U.S.