Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 7 Months, 20 DaysFacility Status: Fully Operational & Accepting New Cases
ZFS Recovery

ZFS Pool Recovery and Troubleshooting Guide

Your ZFS pool is reporting DEGRADED, FAULTED, or UNAVAIL in zpool status. The pool may have refused to import, or it imported but shows data errors. Before running zpool clear or zpool replace, you need to understand which operations are safe and which will overwrite the on-disk state you need for recovery.

This guide covers ZFS pool states, safe export/import procedures, transaction group rollbacks, and raidz fault tolerance.

ZFS Pool States01/07

What Do the Four ZFS Pool States Mean?

ZFS tracks pool health at the vdev level. Each vdev reports one of four states: ONLINE, DEGRADED, FAULTED, or UNAVAIL. The pool state is the worst state among its top-level vdevs.

ONLINE

All vdevs are healthy. No errors detected. Normal operation. No action needed.

DEGRADED

One or more vdevs have lost a member but can still serve data using redundancy (raidz parity or mirror copies). The pool is operational but running without full fault tolerance. Safe to read from; assess before writing.

FAULTED

A vdev has lost too many members to maintain data integrity. raidz1 tolerates one drive failure; losing two or more exceeds its parity. raidz2 tolerates two failures; losing three or more exceeds its parity. The pool cannot serve I/O. Do not write to the remaining drives.

UNAVAIL

ZFS cannot open the device at all. The drive may be disconnected, failed, or its device path may have changed (common after controller or cabling changes). Check physical connections and device paths before assuming hardware failure.

Example: In a raidz2 vdev, if one member fails (UNAVAIL) the vdev transitions to DEGRADED while still serving data. If a second member then reports checksum errors and ZFS marks it FAULTED, the vdev has consumed both units of parity margin. One more failure transitions the pool to FAULTED. At that point there is a narrow window to image the remaining drives before redundancy is exhausted.

Export/Import Procedures02/07

How Do You Safely Export and Import a ZFS Pool?

Exporting a ZFS pool flushes pending writes and marks the pool as cleanly closed. Importing reads the on-disk metadata and reconstructs the in-memory state. Both operations are safe when done correctly, but import with the wrong flags can overwrite recoverable metadata.

  • 1.zpool export poolname flushes all pending transaction groups to disk and marks the pool as exported. This is the cleanest way to take a pool offline. Only works if the pool is in ONLINE or DEGRADED state.
  • 2.zpool import -o readonly=on poolname imports the pool in read-only mode, preventing the OS from writing metadata updates. However, NEVER run this on drives suspected of physical failure. Even a read-only import forces massive random read activity across the array to traverse the ZFS metadata tree, which will destroy a drive with failing read heads. Always perform a sector-by-sector clone of every member drive using dedicated imaging hardware first, and only attempt to import the cloned images.
  • 3.zpool import -f poolname force-imports a pool that was not cleanly exported. ZFS replays the intent log (ZIL) to recover acknowledged synchronous writes not yet committed via a transaction group sync. This writes to the drives and may advance the on-disk state past a recoverable point.
  • 4.If the pool will not import at all, do not use -f repeatedly. Image the drives and work from copies.

Example: If a QNAP NAS loses power during a scrub and its drives are connected to a separate Linux workstation, zpool import may show the pool as available but not exported. Running zpool import -o readonly=on mounts the pool without writing anything to the member drives, allowing data to be copied to a new destination before deciding whether to repair or rebuild the pool.

Dangers of zpool clear03/07

What Are the Risks of zpool clear?

zpool clear resets error counters on a vdev and tells ZFS to retry I/O. If the errors were transient (a loose cable), this brings the vdev back online. If the drive is failing, clearing errors masks the problem and allows further corruption.

  • 1.ZFS tracks read errors, write errors, and checksum errors per device. When error counts exceed internal thresholds, ZFS marks the vdev as FAULTED.
  • 2.zpool clear poolname resets these counters to zero and retries failed I/O. If the drive responds, ZFS marks it ONLINE again.
  • 3.If the underlying drive has bad sectors or a failing head, the errors will return. In the meantime, ZFS will write new data to the faulty drive, and that data may be lost when the errors recur.
  • 4.The next scrub will detect the corruption, but by then the pool may have advanced past the last consistent transaction group.

Rule: Only run zpool clear if you have identified and fixed the root cause (reseated a cable, replaced a controller, resolved a power issue). If the drive itself is failing (check SMART), replace it instead of clearing errors.

Example: If a drive shows checksum errors caused by physical damage and zpool clear is used to reset the counters, the errors disappear temporarily. As new data is written to those damaged sectors in the interim, it is stored with corruption. A later scrub reveals recurring errors, but by then single-parity protection like raidz1 may be unable to repair sectors that were written with bad data.

TXG Rollbacks04/07

How Do ZFS Transaction Group Rollbacks Work?

ZFS uses copy-on-write for all data and metadata. Every write goes into a new location on disk, and the old data remains until the space is reclaimed. Transaction groups (TXGs) are batched commits that advance the pool to a new consistent state. If the latest TXG is corrupted, you can roll back to a previous one.

  • 1.ZFS flushes a new TXG to disk every 5 seconds (default) or when the write buffer fills.
  • 2.The uberblock at the top of the pool metadata tree points to the most recent valid TXG. ZFS stores a ring buffer of historical uberblocks, providing a history of recent TXGs.
  • 3.zpool import -T <txg> -o readonly=on poolname imports the pool using a specific historical TXG instead of the most recent. This effectively rolls back the pool to an earlier consistent state.
  • 4.TXG rollback only works if the on-disk blocks for the older TXG have not been overwritten by subsequent writes. Copy-on-write preserves old blocks until the space is needed, so recently written pools with available free space have better rollback success rates.
  • 5.Use zdb -e -u poolname to list available uberblocks and their TXG numbers before attempting a rollback. The -e flag lets zdb read an exported or non-imported pool directly from raw devices.

Example: If a server loses power during heavy writes, zpool import may fail because the most recent transaction group is corrupt. Running zdb -e -u poolname lists the available uberblocks and their TXG numbers, which can reveal an earlier valid uberblock. Importing with zpool import -T <txg> -o readonly=on poolname rolls the pool back to that earlier consistent state, sacrificing only the uncommitted writes from the moment of failure while preserving everything before it.

Handling FAULTED/UNAVAIL Vdevs05/07

How Do You Handle FAULTED and UNAVAIL Vdevs?

When a vdev is FAULTED or UNAVAIL, the decision to replace or stop depends on the pool's remaining redundancy, the value of the data, and whether the failure is a drive issue or a connection issue.

  • 1.Check physical connections first. UNAVAIL often means ZFS cannot find the device path. A reseated SATA cable or a different controller port may resolve it.
  • 2.Check SMART data. If the drive reports reallocated sectors, pending sectors, or UNC errors, the hardware is failing.
  • 3.If the pool is DEGRADED (still has margin), zpool replace poolname old-dev new-dev initiates a resilver (ZFS term for rebuild). This reads all surviving vdev members to reconstruct the replacement.
  • 4.If the pool is FAULTED (no remaining margin), do not attempt to replace. The pool cannot guarantee data integrity. Image every drive and attempt offline reconstruction.

For enterprise server data recovery, FAULTED ZFS pools on production systems require imaging before any repair attempt. ZFS's copy-on-write architecture means the historical TXG data is still on-disk; any write operation (including a replace or scrub) can overwrite those blocks.

Resilver risk on unstable drives: Running zpool replace on a pool where the remaining drives have marginal heads or firmware instability forces sustained random I/O across every surviving member. Unlike a traditional RAID rebuild (which reads sequentially), ZFS resilvers follow the block pointer tree, generating random seeks that accelerate head failure on physically unstable drives. If the data is irreplaceable, image all members with ddrescue before initiating any resilver. For pools with missing top-level vdevs, the OpenZFS tunable zfs_max_missing_tvds allows importing a pool in read-only mode even when vdevs are absent, enabling data extraction without a full resilver.

Example: If a raidz1 pool is DEGRADED because one member is FAULTED with checksum errors while the surviving members report clean SMART data, replacing the failed drive initiates a resilver. Imaging every member first provides a fallback: zpool replace poolname old-dev new-dev can then be run on the live pool. If the sustained random I/O of the resilver causes a surviving drive to fail, the pool can still be reconstructed offline from the pre-resilver images.

raidz Fault Tolerance06/07

How Do raidz1, raidz2, and raidz3 Differ in Fault Tolerance?

ZFS raidz levels map to traditional RAID data recovery parity concepts: raidz1 is single-parity (like RAID 5), raidz2 is dual-parity (like RAID 6), and raidz3 is triple-parity (no traditional RAID equivalent).

raidz1

Tolerates 1 drive failure. Same URE risk as RAID 5 during resilver. Not recommended for drives larger than 2TB.

raidz2

Tolerates 2 drive failures. The current recommendation for most ZFS deployments with large drives. Resilver can complete even with one URE.

raidz3

Tolerates 3 drive failures. Used in large-capacity deployments (12+ drives) where resilver times exceed 48 hours and multi-drive failure is a realistic scenario.

ZFS has one advantage over traditional RAID during rebuilds: it only resilvers allocated blocks, not the entire drive. A raidz2 pool at 50% capacity resilvers roughly half the data compared to a RAID 6 rebuild. This reduces both the time window and the total bytes read, lowering URE risk.

Example: When a drive fails in a partially full raidz2 pool, the resilver reads only the allocated data across the surviving members rather than the full usable capacity. Because empty unallocated space is skipped, the total bytes read during the rebuild are lower than a traditional full-disk RAID 6 rebuild, which reduces the cumulative exposure to an unrecoverable read error during the operation.

Faq07/07

Frequently Asked Questions

Can data be recovered from a FAULTED ZFS pool?

In most cases, yes. FAULTED means ZFS has determined that the pool cannot guarantee data integrity with its current set of available vdevs. The data is still on the drives. Recovery involves imaging each drive with a write-blocker and reconstructing the pool offline. ZFS stores extensive metadata, including multiple copies of the uberblock and transaction group history, which professional tools can use to rebuild the pool state even when the live pool refuses to import.

What does UNAVAIL mean in ZFS?

UNAVAIL means ZFS cannot open the vdev at all. The drive may have failed, been disconnected, or the device path may have changed. If the vdev is part of a raidz group and too many members are UNAVAIL (more than the parity level allows), the entire pool transitions to FAULTED. A single UNAVAIL vdev in a mirror is tolerated as long as the other mirror member is ONLINE. Check 'zpool status' for the specific vdev and drive identifier.

Is it safe to use zpool clear on a degraded pool?

zpool clear resets error counters and retries I/O on vdevs that ZFS has flagged. If the errors were transient (a loose cable, a temporary controller issue), clearing can bring the vdev back online. If the errors reflect a real hardware failure, clearing masks the problem and allows the pool to continue operating with a drive that is actively failing. Future writes to that drive may be lost. Only use zpool clear if you have identified and resolved the root cause of the errors.

ZFS pool FAULTED or UNAVAIL?

Free evaluation. Write-blocked drive imaging. Offline pool reconstruction with TXG history preserved. No data, no fee.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews