Btrfs Filesystem Corruption Recovery

BTRFS B-Tree Architecture02/11

Btrfs B-Tree Architecture and Failure Points

Btrfs stores all metadata and data references in a hierarchy of B-trees rooted at the superblock. Corruption in any tree can cascade through the dependency chain and render the entire volume unmountable.

Superblock: The entry point for the entire filesystem. Three copies at fixed byte offsets: 64 KiB, 64 MiB, and 256 GiB. Contains the generation number (transaction count), the root tree pointer, and the chunk tree pointer. If the primary superblock at 64 KiB is corrupted, the filesystem cannot mount. Backup copies at the other offsets can be promoted via btrfs rescue super-recover.
Root Tree (Tree of Tree Roots): Pointed to by the superblock. Holds the root nodes for all other B-trees (extent tree, chunk tree, FS tree, checksum tree). If the root tree is corrupted, btrfs-find-root scans the raw device for older root tree nodes from previous CoW transactions.
Chunk Tree: Maps logical Btrfs addresses to physical device byte offsets. On multi-device setups, the chunk tree also tracks which physical device holds each data or metadata chunk. Chunk tree corruption produces read_block_for_search: [logical addr] mirror 1 failed errors in dmesg. Without a valid chunk tree, the kernel cannot locate any data on the underlying block device.
Extent Tree: Tracks allocated space and reference counts for all data and metadata blocks. Extent tree corruption is the most common failure mode during power loss while a balance or snapshot operation is in progress. The filesystem drops to read-only because Btrfs cannot determine which blocks are free and which are in use.
FS Tree: Contains the actual file and directory hierarchy: inodes, directory entries, extent data references. Each Btrfs subvolume has its own FS tree. Snapshot creation clones the FS tree by reference, sharing blocks with the parent subvolume via reference counting in the extent tree.

Key detail: Because Btrfs is Copy-on-Write, every metadata update allocates new blocks rather than overwriting existing ones. The old blocks remain on disk until the space is reclaimed by a balance operation or by free-space pressure. This means older, valid versions of every B-tree node typically survive on disk after corruption occurs. The recovery window closes only when new writes overwrite those historical blocks.

Causes of Corruption03/11

How Does Btrfs Metadata Get Corrupted?

Btrfs metadata corruption has three primary causes: hardware failure in the storage layer, power loss during CoW transaction commits, and kernel bugs in balance or snapshot operations.

1.Power loss during a transaction commit. Btrfs groups writes into transactions and commits them atomically. A power failure mid-commit can leave the extent tree or chunk tree in a state where some blocks reference the new transaction and others reference the old one. The superblock may point to a partially written root tree node. On the next mount attempt, the kernel detects a transid verify failed mismatch and refuses to proceed.
2.Bad sectors or read failures in the metadata region. On HDDs, media degradation in sectors containing B-tree nodes produces I/O errors when the Btrfs kernel module attempts to traverse the tree. The drive SMART log shows reallocated or pending sectors. The filesystem reports checksum verification failures and drops to read-only. The user data extents may be fully intact on other regions of the platter while a small number of metadata blocks are physically unreadable.
3.NVMe FTL corruption on power loss. NVMe controllers manage NAND access through a Flash Translation Layer that maps logical block addresses to physical NAND pages. A power loss during FTL journal commit can corrupt the mapping for sectors that contain Btrfs metadata. The drive reports the correct capacity and appears healthy in SMART, but returns UNC errors or stale data for specific LBA ranges. The Btrfs chunk tree loses its physical address translation, severing the link between logical filesystem structures and the NAND blocks that hold them. No filesystem-level tool can fix this; the FTL must be stabilized via PC-3000 SSD before any logical recovery.
4.Interrupted balance or conversion operations. btrfs balance relocates data and metadata chunks across devices or RAID profiles. A crash during balance leaves the chunk tree referencing both old and new physical locations for the same logical address. The extent tree reference counts become inconsistent because some blocks were moved but the reference count update did not complete. This is one of the most common triggers for extent tree corruption on multi-device setups.
5.SMR hard drives with Btrfs CoW. Shingled Magnetic Recording (SMR) drives use a write cache (CMR zone) that is periodically flushed to shingled zones. Btrfs CoW generates high random-write workloads that can overflow the CMR cache. When the cache fills, the drive enters a reorganization phase with long latencies. If a transaction commit times out during this reorganization, the kernel may mark the device as failed. The metadata written during the timeout window may be partially committed to the CMR cache but not yet flushed to the shingle zones.

Dangerous Commands04/11

Which Commands Destroy Btrfs Volumes During Recovery?

The standard sysadmin response to a Btrfs error is to attempt in-place repair. Several commonly recommended commands convert a recoverable metadata inconsistency into permanent data loss.

Command	What It Does	Why It Destroys Data
`btrfsck --repair`	Writes corrected metadata in-place	Overwrites older CoW metadata blocks that btrfs-find-root needs to locate previous valid tree roots. On physically failing drives, writes may land on unstable media.
`btrfs rescue zero-log`	Clears the log tree to allow mounting	Any writes acknowledged by the log tree but not yet committed via a full transaction are lost permanently. The filesystem may mount but recent files are gone.
`mount -o recovery,ro`	Mounts in read-only with recovery heuristics	Despite the read-only flag, mounting replays the log tree and writes metadata updates to disk. This advances the on-disk generation past the last consistent state. On physically failing drives, the random read I/O generated by tree traversal accelerates head degradation.
`btrfs balance start`	Relocates chunks across devices	Moves data and metadata blocks to new physical locations and updates the chunk tree. If the extent tree is already inconsistent, balance will corrupt the chunk tree as well, destroying the logical-to-physical address mapping.
`mkfs.btrfs /dev/sdX`	Creates a new Btrfs filesystem on the device	Writes new superblocks, root tree, chunk tree, and extent tree to the device. All three superblock copies are overwritten. Data extents beyond the metadata region survive but are unreachable without the original tree structure.

Before running any Btrfs repair tool: Image the entire block device to a separate storage target using write-blocked connections. All recovery attempts must operate on images, not original media. If the underlying drive has physical faults (check SMART), the imaging step captures recoverable sectors before the drive condition worsens.

Superblock Recovery05/11

Btrfs Superblock and Root Tree Recovery

When the primary superblock or root tree pointer is corrupted, the standard mount path fails. Btrfs provides two utilities for locating alternative entry points into the filesystem, but both must be used on cloned images to avoid writing to failing media.

1.btrfs rescue super-recover compares the generation numbers and checksums of all three superblock copies. If a backup superblock (at 64 MiB or 256 GiB) has a valid checksum and a generation number equal to or higher than the primary, the tool overwrites the corrupted primary. This fixes the entry point without touching any other metadata.
2.btrfs-find-root scans the raw device for B-tree node headers that match the root tree signature. Because Btrfs CoW allocates new blocks for every metadata update, older root tree nodes from previous transactions remain on disk until their space is reclaimed. btrfs-find-root lists all discovered root nodes with their generation numbers. Select the highest generation that predates the corruption event.
3.btrfs restore -t [bytenr] takes a discovered root tree byte offset (the bytenr reported by btrfs-find-root) and walks the B-tree from that point, copying all reachable files to a separate target device. This tool is purely read-only. It does not modify the source device or image. You can run it multiple times with different candidate roots to compare results from different transaction generations.

Root tree rollback recovery: If a Btrfs volume fails to mount with parent transid verify failed after a power loss, the filesystem's copy-on-write design preserves historical root tree nodes. btrfs-find-root scans the device for previous root tree generations. Selecting an earlier root from before the crash provides a consistent tree structure; pass its byte offset (the bytenr reported by btrfs-find-root) to btrfs restore -t [bytenr] /dev/image /mnt/recovery/ to extract data from that snapshot. Only writes committed between the selected generation and the crash are unrecoverable.

Synology DSM 7+06/11

Synology DSM 7+ Btrfs Volume Recovery

Synology DSM does not use native Btrfs multi-device RAID profiles. DSM layers three storage technologies: mdadm for RAID parity, LVM for volume management, and a single-device Btrfs filesystem inside the LVM logical volume. A "volume crashed" error in the DSM UI can originate from any of these three layers. Btrfs is the default volume filesystem on Synology and TerraMaster units, so most of these arrays reach us through our NAS data recovery service.

1.mdadm layer failure. If an mdadm array member fails or loses its superblock, the RAID device (/dev/md*) cannot assemble. LVM cannot find its physical volume because the block device is gone. Btrfs never enters the picture. Diagnosing this layer requires mdadm --examine on each raw drive to check superblock presence and array UUID consistency. See our mdadm superblock recovery guide for that specific failure.
2.LVM metadata corruption. If the mdadm array assembles correctly but vgscan returns nothing, the LVM VGDA on the assembled RAID device is damaged. The Btrfs filesystem data is intact but unreachable until the LVM extent map is reconstructed. See our LVM metadata corruption recovery guide for the VGDA reconstruction procedure.
3.Btrfs metadata corruption (actual). If mdadm and LVM are both intact but the Btrfs filesystem reports tree corruption on mount, the failure is in the Btrfs B-tree layer. On Synology this is a single-device filesystem, so the standard btrfs-find-root and btrfs restore workflow applies. The additional complication is that Synology uses Btrfs snapshots for its Snapshot Replication app; snapshot metadata corruption in the FS tree can block access to the parent subvolume.

Synology DSM 7.2 out-of-space lockout: When a Synology Btrfs volume reaches capacity, automatic snapshot deletion fails because Btrfs CoW requires free space to delete blocks (deleting a snapshot modifies the extent tree, which requires allocating new metadata blocks). The volume enters a read-only lockout state that btrfsck cannot resolve because the tool itself needs to allocate blocks. Recovery requires imaging the volume to larger target storage, mounting the image with space available, and manually purging snapshots before the filesystem becomes writable again.

Diagnostic sequence for Synology: Check bottom-up. (1) Pull drives from the NAS and connect to a Linux workstation via SATA write-blocker. (2) Run mdadm --examine /dev/sdX on each drive to verify mdadm superblocks and array UUIDs. (3) If mdadm is intact, assemble the array read-only and run pvs to check LVM. (4) If LVM is intact, attempt a read-only mount of the Btrfs filesystem. (5) If mount fails, run btrfs-find-root on the LV device to locate valid tree roots.

BTRFS RAID Failures07/11

Native Btrfs RAID Profile Failures

Btrfs can manage its own multi-device RAID profiles without mdadm or hardware controllers. RAID 1 and RAID 10 are stable for production use. RAID 5 and RAID 6 carry a known parity desynchronization risk.

RAID 1 / RAID 10

Mirror-based profiles. Each metadata and data block is written to two or more devices. If one device returns a checksum mismatch, Btrfs reads the correct copy from the mirror and automatically repairs the bad copy. Stable in production since kernel 3.x. Recovery from a failed mirror member is straightforward: image the surviving member and mount.

RAID 5 / RAID 6

Parity-based profiles with a known "write hole" bug. A power loss can desynchronize parity stripes from data stripes because Btrfs lacks a dedicated parity journal. In a degraded state (one drive failed), reads that depend on corrupted parity return garbage data. Btrfs flags this as a checksum failure and drops the array offline. Do not rebuild a degraded Btrfs RAID 5/6 without imaging all members first.

Btrfs RAID 5/6 write hole: Unlike ZFS (which uses CoW for parity via raidz) or mdadm (which can journal parity via bitmap), Btrfs RAID 5/6 has no mechanism to atomically commit both data and parity. A crash during a partial stripe write leaves an irreconcilable state: the data block is from one generation and the parity block is from another. When a drive subsequently fails and the array attempts to reconstruct data from parity, the result is corrupt. The Btrfs kernel mailing list has tracked this issue since 2015. As of kernel 6.x, it remains unresolved.

Recovery Methodology08/11

How We Recover Btrfs Volumes

Professional Btrfs recovery separates the physical imaging step from the logical B-tree traversal. We do not run btrfs tools on original media under any circumstances. The same superblock, root-tree, and btrfs restore workflow runs on every case handled by our Btrfs filesystem recovery service.

1.Image all devices. Each drive is connected via write-blocked interface and cloned sector-by-sector. On HDDs with bad sectors in the metadata region, PC-3000 selective head imaging and adaptive read parameter adjustment extracts the B-tree node sectors from degraded platters. On NVMe SSDs with FTL corruption, we stabilize the controller firmware via PC-3000 SSD before extracting the LBA range. DeepSpar Disk Imager handles drives with intermittent read failures that require sector-level retry strategies.
2.Verify superblock integrity. On the imaged clone, check all three superblock copies for valid checksums and consistent generation numbers. If the primary is corrupted but a backup is valid, use btrfs rescue super-recover on the image. If all three are corrupted (rare, except when mkfs.btrfs was run on the device), proceed to root tree scanning.
3.Locate valid tree roots. Run btrfs-find-root on the imaged clone. The tool scans for B-tree node headers across the entire device and reports all discovered root tree nodes with their generation numbers. We select the highest generation that produces a consistent tree traversal.
4.Extract data via btrfs restore. Using the selected tree root ID, run btrfs restore to copy all reachable files from the imaged clone to separate target storage. btrfs restore walks the B-tree directly from raw disk, bypassing the normal mount path. No writes to the source image.
5.Handle multi-layer NAS stacks. For Synology volume crashes and similar NAS configurations, we first reconstruct the mdadm array from member drive images, then reconstruct the LVM volume group, and only then address the Btrfs layer. Each layer is verified independently before proceeding to the next.

Multi-layer diagnostic approach: Synology SHR-2 stacks three layers: mdadm RAID 6, LVM, and Btrfs. When DSM reports a crashed pool after a firmware update failure, the actual corruption may be in any layer. Recovery starts by imaging all drives via PC-3000 (checking SMART for hardware issues on each member). mdadm superblocks and LVM metadata are examined independently. If both are intact, the failure is in the Btrfs layer: typically a generation mismatch in the extent tree from a partial transaction commit. btrfs-find-root locates historical root tree nodes, and btrfs restore extracts data from the last consistent generation before the crash.

Hardware vs Logical09/11

When Does Btrfs Corruption Mask a Hardware Failure?

Btrfs reports the same error messages regardless of whether the B-tree corruption has a logical cause (power loss, software bug) or a physical cause (failing drive hardware). Running logical repair tools on a hardware failure converts a recoverable problem into permanent data loss.

HDD: Weak Read Heads Causing Metadata I/O Errors

A hard drive with degrading read heads may still pass basic SMART checks while failing to read specific sectors in the metadata region. Btrfs reports checksum failures and refuses to mount. The sysadmin runs btrfsck --repair, which forces intensive random I/O across the drive as it traverses and rewrites the B-tree. This accelerates head degradation and can cause a complete head failure during the repair. The data that was recoverable via careful imaging is now permanently inaccessible. PC-3000 identifies weak heads through vendor-specific diagnostic commands and uses adaptive read parameters to extract the metadata sectors before the heads fail completely.

NVMe SSD: FTL Corruption Mimicking Logical Failure

An NVMe SSD with a corrupted Flash Translation Layer returns UNC errors for specific LBA ranges while reporting healthy SMART status. Btrfs detects checksum mismatches on the affected blocks and drops to read-only. The error messages look identical to logical corruption. Running filesystem repair tools writes new metadata to the drive, but the controller may route those writes to incorrect NAND pages due to the damaged FTL mapping. The firmware must be stabilized and the FTL rebuilt via PC-3000 SSD before any Btrfs tools can operate correctly on the recovered data.

Pricing10/11

Btrfs Recovery Pricing

Btrfs recovery pricing depends on whether the underlying storage is physically healthy and how many storage layers need reconstruction.

Scenario	Price Range	What's Involved
Logical Btrfs corruption (healthy hardware)	From $250	Superblock recovery, btrfs-find-root tree scanning, btrfs restore extraction to target media. Single drive, no physical damage.
Multi-drive NAS with mdadm + LVM + Btrfs	$600–$900	Image all array members, reconstruct mdadm layer, reconstruct LVM layer, then recover Btrfs data. Synology SHR, QNAP, and OpenMediaVault configurations.
NVMe FTL corruption + Btrfs recovery	$900–$1,200	PC-3000 SSD firmware stabilization, FTL mapping rebuild, followed by logical Btrfs B-tree traversal and data extraction.
HDD hardware failure + Btrfs reconstruction	$1,200–$1,500	PC-3000 hardware imaging (head swap, firmware repair, platter transplant), followed by Btrfs B-tree recovery on the extracted image.

All prices subject to evaluation. No diagnostic fee. No data, no recovery fee. Multi-drive configurations priced per the total number of drives requiring imaging and the complexity of the stacked storage layers.

Faq11/11

Frequently Asked Questions

Is btrfsck --repair safe to run on a failing drive?

No. btrfsck --repair writes corrected metadata back to the filesystem in-place. If the drive has physical damage (weak read heads, NVMe FTL corruption, bad NAND blocks), those writes may land on unstable media or be mis-mapped by a damaged Flash Translation Layer. Worse, the repair process overwrites older CoW metadata blocks that btrfs-find-root could have used to locate a previous valid tree root. Btrfs kernel developers explicitly discourage running --repair as a first-line recovery tool. Image the drive first; run repairs only on the clone.

Is Synology Btrfs the same as native Linux Btrfs?

Not at the storage layer. Synology DSM uses mdadm for RAID parity, LVM for volume management, and formats a single-device Btrfs filesystem inside the LVM logical volume. Native Linux Btrfs can manage its own multi-device RAID profiles directly. This distinction matters for recovery: on Synology, the mdadm and LVM layers must be intact before the Btrfs filesystem can be accessed. A Synology 'volume crashed' error may originate from mdadm superblock loss or LVM VGDA corruption, not from Btrfs metadata damage at all.

Is Btrfs RAID 5 stable enough for production use?

Btrfs RAID 5 and RAID 6 still carry the 'write hole' problem as of kernel 6.x. A sudden power loss can desynchronize parity stripes from data stripes because Btrfs lacks a dedicated parity journal. In a degraded state (one drive failed), reads that depend on the corrupted parity block return bad data, which Btrfs flags as a checksum mismatch and drops the array offline. For production workloads, Btrfs RAID 1 and RAID 10 profiles are stable. If you need RAID 5/6 parity economics with Btrfs features, use mdadm for parity and layer Btrfs on top as a single-device filesystem.

Where are Btrfs superblocks stored on disk?

Btrfs writes up to three superblock copies at fixed byte offsets: 64 KiB (primary), 64 MiB (first backup), and 256 GiB (second backup). The number written depends on device capacity, with BTRFS_SUPER_MIRROR_MAX = 3 as the maximum; a backup copy is only written if the device is large enough to contain its offset. If the primary superblock at 64 KiB is corrupted, btrfs rescue super-recover can overwrite it with a valid backup from the 64 MiB or 256 GiB offset. The superblock contains the generation number (transaction count), the root tree pointer, and the chunk tree pointer. A standard mount only needs the primary superblock to be valid and consistent.

What is the difference between btrfs restore and mounting the filesystem?

Mounting a Btrfs filesystem replays the log tree and may write metadata updates to disk, advancing the on-disk state past a recoverable point. btrfs restore is a read-only extraction tool that bypasses the normal mount process entirely. It walks the B-tree structure directly from raw disk and copies files to a separate target device without writing anything to the source. You can feed btrfs restore an older tree root ID (found via btrfs-find-root) to extract data from a previous filesystem state, even if the current metadata is destroyed.

Can deleted Btrfs files be recovered from an SSD?

If the SSD has TRIM/discard enabled (the default on most modern Linux distributions), the controller invalidates the FTL mapping for deleted blocks and returns zeroes on subsequent reads. The physical NAND charge may still exist, but the controller will not serve it through normal I/O. Recovery of TRIM'd blocks is not possible through standard or professional data recovery methods. If discard was disabled at mount time (mount -o nodiscard), deleted file extents remain on the NAND until overwritten by new data, and recovery via btrfs restore with an older tree root may succeed.

No Data, No Fee

Guarantee

2.49M+

Subscribers

4.9

1,837+ Google Reviews

Since 2008

Established

Repairs on Video

Full Transparency

As Featured In

BBC News