
Btrfs B-Tree Architecture and Failure Points
Btrfs stores all metadata and data references in a hierarchy of B-trees rooted at the superblock. Corruption in any tree can cascade through the dependency chain and render the entire volume unmountable.
- Superblock
- The entry point for the entire filesystem. Three copies at fixed byte offsets: 64 KiB, 64 MiB, and 256 GiB. Contains the generation number (transaction count), the root tree pointer, and the chunk tree pointer. If the primary superblock at 64 KiB is corrupted, the filesystem cannot mount. Backup copies at the other offsets can be promoted via
btrfs rescue super-recover. - Root Tree (Tree of Tree Roots)
- Pointed to by the superblock. Holds the root nodes for all other B-trees (extent tree, chunk tree, FS tree, checksum tree). If the root tree is corrupted,
btrfs-find-rootscans the raw device for older root tree nodes from previous CoW transactions. - Chunk Tree
- Maps logical Btrfs addresses to physical device byte offsets. On multi-device setups, the chunk tree also tracks which physical device holds each data or metadata chunk. Chunk tree corruption produces
read_block_for_search: [logical addr] mirror 1 failederrors in dmesg. Without a valid chunk tree, the kernel cannot locate any data on the underlying block device. - Extent Tree
- Tracks allocated space and reference counts for all data and metadata blocks. Extent tree corruption is the most common failure mode during power loss while a balance or snapshot operation is in progress. The filesystem drops to read-only because Btrfs cannot determine which blocks are free and which are in use.
- FS Tree
- Contains the actual file and directory hierarchy: inodes, directory entries, extent data references. Each Btrfs subvolume has its own FS tree. Snapshot creation clones the FS tree by reference, sharing blocks with the parent subvolume via reference counting in the extent tree.
Key detail: Because Btrfs is Copy-on-Write, every metadata update allocates new blocks rather than overwriting existing ones. The old blocks remain on disk until the space is reclaimed by a balance operation or by free-space pressure. This means older, valid versions of every B-tree node typically survive on disk after corruption occurs. The recovery window closes only when new writes overwrite those historical blocks.
How Btrfs Metadata Gets Corrupted
Btrfs metadata corruption has three primary causes: hardware failure in the storage layer, power loss during CoW transaction commits, and kernel bugs in balance or snapshot operations.
- 1.Power loss during a transaction commit. Btrfs groups writes into transactions and commits them atomically. A power failure mid-commit can leave the extent tree or chunk tree in a state where some blocks reference the new transaction and others reference the old one. The superblock may point to a partially written root tree node. On the next mount attempt, the kernel detects a
transid verify failedmismatch and refuses to proceed. - 2.Bad sectors or read failures in the metadata region. On HDDs, media degradation in sectors containing B-tree nodes produces I/O errors when the Btrfs kernel module attempts to traverse the tree. The drive SMART log shows reallocated or pending sectors. The filesystem reports checksum verification failures and drops to read-only. The user data extents may be fully intact on other regions of the platter while a small number of metadata blocks are physically unreadable.
- 3.NVMe FTL corruption on power loss. NVMe controllers manage NAND access through a Flash Translation Layer that maps logical block addresses to physical NAND pages. A power loss during FTL journal commit can corrupt the mapping for sectors that contain Btrfs metadata. The drive reports the correct capacity and appears healthy in SMART, but returns UNC errors or stale data for specific LBA ranges. The Btrfs chunk tree loses its physical address translation, severing the link between logical filesystem structures and the NAND blocks that hold them. No filesystem-level tool can fix this; the FTL must be stabilized via PC-3000 SSD before any logical recovery.
- 4.Interrupted balance or conversion operations.
btrfs balancerelocates data and metadata chunks across devices or RAID profiles. A crash during balance leaves the chunk tree referencing both old and new physical locations for the same logical address. The extent tree reference counts become inconsistent because some blocks were moved but the reference count update did not complete. This is one of the most common triggers for extent tree corruption on multi-device setups. - 5.SMR hard drives with Btrfs CoW. Shingled Magnetic Recording (SMR) drives use a write cache (CMR zone) that is periodically flushed to shingled zones. Btrfs CoW generates high random-write workloads that can overflow the CMR cache. When the cache fills, the drive enters a reorganization phase with long latencies. If a transaction commit times out during this reorganization, the kernel may mark the device as failed. The metadata written during the timeout window may be partially committed to the CMR cache but not yet flushed to the shingle zones.
Commands That Destroy Btrfs Volumes During Recovery
The standard sysadmin response to a Btrfs error is to attempt in-place repair. Several commonly recommended commands convert a recoverable metadata inconsistency into permanent data loss.
| Command | What It Does | Why It Destroys Data |
|---|---|---|
btrfsck --repair | Writes corrected metadata in-place | Overwrites older CoW metadata blocks that btrfs-find-root needs to locate previous valid tree roots. On physically failing drives, writes may land on unstable media. |
btrfs rescue zero-log | Clears the log tree to allow mounting | Any writes acknowledged by the log tree but not yet committed via a full transaction are lost permanently. The filesystem may mount but recent files are gone. |
mount -o recovery,ro | Mounts in read-only with recovery heuristics | Despite the read-only flag, mounting replays the log tree and writes metadata updates to disk. This advances the on-disk generation past the last consistent state. On physically failing drives, the random read I/O generated by tree traversal accelerates head degradation. |
btrfs balance start | Relocates chunks across devices | Moves data and metadata blocks to new physical locations and updates the chunk tree. If the extent tree is already inconsistent, balance will corrupt the chunk tree as well, destroying the logical-to-physical address mapping. |
mkfs.btrfs /dev/sdX | Creates a new Btrfs filesystem on the device | Writes new superblocks, root tree, chunk tree, and extent tree to the device. All three superblock copies are overwritten. Data extents beyond the metadata region survive but are unreachable without the original tree structure. |
Before running any Btrfs repair tool: Image the entire block device to a separate storage target using write-blocked connections. All recovery attempts must operate on images, not original media. If the underlying drive has physical faults (check SMART), the imaging step captures recoverable sectors before the drive condition worsens.
Btrfs Superblock and Root Tree Recovery
When the primary superblock or root tree pointer is corrupted, the standard mount path fails. Btrfs provides two utilities for locating alternative entry points into the filesystem, but both must be used on cloned images to avoid writing to failing media.
- 1.btrfs rescue super-recover compares the generation numbers and checksums of all three superblock copies. If a backup superblock (at 64 MiB or 256 GiB) has a valid checksum and a generation number equal to or higher than the primary, the tool overwrites the corrupted primary. This fixes the entry point without touching any other metadata.
- 2.btrfs-find-root scans the raw device for B-tree node headers that match the root tree signature. Because Btrfs CoW allocates new blocks for every metadata update, older root tree nodes from previous transactions remain on disk until their space is reclaimed. btrfs-find-root lists all discovered root nodes with their generation numbers. Select the highest generation that predates the corruption event.
- 3.btrfs restore -t [root_id] takes the discovered root tree ID and walks the B-tree from that point, copying all reachable files to a separate target device. This tool is purely read-only. It does not modify the source device or image. You can run it multiple times with different root IDs to compare results from different transaction generations.
Root tree rollback recovery: If a Btrfs volume fails to mount with parent transid verify failed after a power loss, the filesystem's copy-on-write design preserves historical root tree nodes. btrfs-find-root scans the device for previous root tree generations. Selecting a generation from before the crash provides a consistent tree structure. btrfs restore -t [generation] /dev/image /mnt/recovery/ extracts data from that snapshot. Only writes committed between the selected generation and the crash are unrecoverable.
Synology DSM 7+ Btrfs Volume Recovery
Synology DSM does not use native Btrfs multi-device RAID profiles. DSM layers three storage technologies: mdadm for RAID parity, LVM for volume management, and a single-device Btrfs filesystem inside the LVM logical volume. A "volume crashed" error in the DSM UI can originate from any of these three layers.
- 1.mdadm layer failure. If an mdadm array member fails or loses its superblock, the RAID device (/dev/md*) cannot assemble. LVM cannot find its physical volume because the block device is gone. Btrfs never enters the picture. Diagnosing this layer requires
mdadm --examineon each raw drive to check superblock presence and array UUID consistency. See our mdadm superblock recovery guide for that specific failure. - 2.LVM metadata corruption. If the mdadm array assembles correctly but
vgscanreturns nothing, the LVM VGDA on the assembled RAID device is damaged. The Btrfs filesystem data is intact but unreachable until the LVM extent map is reconstructed. See our LVM metadata corruption recovery guide for the VGDA reconstruction procedure. - 3.Btrfs metadata corruption (actual). If mdadm and LVM are both intact but the Btrfs filesystem reports tree corruption on mount, the failure is in the Btrfs B-tree layer. On Synology this is a single-device filesystem, so the standard btrfs-find-root and btrfs restore workflow applies. The additional complication is that Synology uses Btrfs snapshots for its Snapshot Replication app; snapshot metadata corruption in the FS tree can block access to the parent subvolume.
Synology DSM 7.2 out-of-space lockout: When a Synology Btrfs volume reaches capacity, automatic snapshot deletion fails because Btrfs CoW requires free space to delete blocks (deleting a snapshot modifies the extent tree, which requires allocating new metadata blocks). The volume enters a read-only lockout state that btrfsck cannot resolve because the tool itself needs to allocate blocks. Recovery requires imaging the volume to larger target storage, mounting the image with space available, and manually purging snapshots before the filesystem becomes writable again.
Diagnostic sequence for Synology: Check bottom-up. (1) Pull drives from the NAS and connect to a Linux workstation via SATA write-blocker. (2) Run mdadm --examine /dev/sdX on each drive to verify mdadm superblocks and array UUIDs. (3) If mdadm is intact, assemble the array read-only and run pvs to check LVM. (4) If LVM is intact, attempt a read-only mount of the Btrfs filesystem. (5) If mount fails, run btrfs-find-root on the LV device to locate valid tree roots.
Native Btrfs RAID Profile Failures
Btrfs can manage its own multi-device RAID profiles without mdadm or hardware controllers. RAID 1 and RAID 10 are stable for production use. RAID 5 and RAID 6 carry a known parity desynchronization risk.
RAID 1 / RAID 10
Mirror-based profiles. Each metadata and data block is written to two or more devices. If one device returns a checksum mismatch, Btrfs reads the correct copy from the mirror and automatically repairs the bad copy. Stable in production since kernel 3.x. Recovery from a failed mirror member is straightforward: image the surviving member and mount.
RAID 5 / RAID 6
Parity-based profiles with a known "write hole" bug. A power loss can desynchronize parity stripes from data stripes because Btrfs lacks a dedicated parity journal. In a degraded state (one drive failed), reads that depend on corrupted parity return garbage data. Btrfs flags this as a checksum failure and drops the array offline. Do not rebuild a degraded Btrfs RAID 5/6 without imaging all members first.
Btrfs RAID 5/6 write hole: Unlike ZFS (which uses CoW for parity via raidz) or mdadm (which can journal parity via bitmap), Btrfs RAID 5/6 has no mechanism to atomically commit both data and parity. A crash during a partial stripe write leaves an irreconcilable state: the data block is from one generation and the parity block is from another. When a drive subsequently fails and the array attempts to reconstruct data from parity, the result is corrupt. The Btrfs kernel mailing list has tracked this issue since 2015. As of kernel 6.x, it remains unresolved.
How We Recover Btrfs Volumes
Professional Btrfs recovery separates the physical imaging step from the logical B-tree traversal. We do not run btrfs tools on original media under any circumstances.
- 1.Image all devices. Each drive is connected via write-blocked interface and cloned sector-by-sector. On HDDs with bad sectors in the metadata region, PC-3000 selective head imaging and adaptive read parameter adjustment extracts the B-tree node sectors from degraded platters. On NVMe SSDs with FTL corruption, we stabilize the controller firmware via PC-3000 SSD before extracting the LBA range. DeepSpar Disk Imager handles drives with intermittent read failures that require sector-level retry strategies.
- 2.Verify superblock integrity. On the imaged clone, check all three superblock copies for valid checksums and consistent generation numbers. If the primary is corrupted but a backup is valid, use btrfs rescue super-recover on the image. If all three are corrupted (rare, except when mkfs.btrfs was run on the device), proceed to root tree scanning.
- 3.Locate valid tree roots. Run btrfs-find-root on the imaged clone. The tool scans for B-tree node headers across the entire device and reports all discovered root tree nodes with their generation numbers. We select the highest generation that produces a consistent tree traversal.
- 4.Extract data via btrfs restore. Using the selected tree root ID, run btrfs restore to copy all reachable files from the imaged clone to separate target storage. btrfs restore walks the B-tree directly from raw disk, bypassing the normal mount path. No writes to the source image.
- 5.Handle multi-layer NAS stacks. For Synology volume crashes and similar NAS configurations, we first reconstruct the mdadm array from member drive images, then reconstruct the LVM volume group, and only then address the Btrfs layer. Each layer is verified independently before proceeding to the next.
Multi-layer diagnostic approach: Synology SHR-2 stacks three layers: mdadm RAID 6, LVM, and Btrfs. When DSM reports a crashed pool after a firmware update failure, the actual corruption may be in any layer. Recovery starts by imaging all drives via PC-3000 (checking SMART for hardware issues on each member). mdadm superblocks and LVM metadata are examined independently. If both are intact, the failure is in the Btrfs layer: typically a generation mismatch in the extent tree from a partial transaction commit. btrfs-find-root locates historical root tree nodes, and btrfs restore extracts data from the last consistent generation before the crash.
When Btrfs Corruption Masks a Hardware Failure
Btrfs reports the same error messages regardless of whether the B-tree corruption has a logical cause (power loss, software bug) or a physical cause (failing drive hardware). Running logical repair tools on a hardware failure converts a recoverable problem into permanent data loss.
HDD: Weak Read Heads Causing Metadata I/O Errors
A hard drive with degrading read heads may still pass basic SMART checks while failing to read specific sectors in the metadata region. Btrfs reports checksum failures and refuses to mount. The sysadmin runs btrfsck --repair, which forces intensive random I/O across the drive as it traverses and rewrites the B-tree. This accelerates head degradation and can cause a complete head failure during the repair. The data that was recoverable via careful imaging is now permanently inaccessible. PC-3000 identifies weak heads through vendor-specific diagnostic commands and uses adaptive read parameters to extract the metadata sectors before the heads fail completely.
NVMe SSD: FTL Corruption Mimicking Logical Failure
An NVMe SSD with a corrupted Flash Translation Layer returns UNC errors for specific LBA ranges while reporting healthy SMART status. Btrfs detects checksum mismatches on the affected blocks and drops to read-only. The error messages look identical to logical corruption. Running filesystem repair tools writes new metadata to the drive, but the controller may route those writes to incorrect NAND pages due to the damaged FTL mapping. The firmware must be stabilized and the FTL rebuilt via PC-3000 SSD before any Btrfs tools can operate correctly on the recovered data.
Btrfs Recovery Pricing
Btrfs recovery pricing depends on whether the underlying storage is physically healthy and how many storage layers need reconstruction.
| Scenario | Price Range | What's Involved |
|---|---|---|
| Logical Btrfs corruption (healthy hardware) | $250+ | Superblock recovery, btrfs-find-root tree scanning, btrfs restore extraction to target media. Single drive, no physical damage. |
| Multi-drive NAS with mdadm + LVM + Btrfs | $600 - $900 | Image all array members, reconstruct mdadm layer, reconstruct LVM layer, then recover Btrfs data. Synology SHR, QNAP, and OpenMediaVault configurations. |
| NVMe FTL corruption + Btrfs recovery | $900 - $1,200 | PC-3000 SSD firmware stabilization, FTL mapping rebuild, followed by logical Btrfs B-tree traversal and data extraction. |
| HDD hardware failure + Btrfs reconstruction | $1,200 - $1,500 | PC-3000 hardware imaging (head swap, firmware repair, platter transplant), followed by Btrfs B-tree recovery on the extracted image. |
All prices subject to evaluation. No diagnostic fee. No data, no recovery fee. Multi-drive configurations priced per the total number of drives requiring imaging and the complexity of the stacked storage layers.
Frequently Asked Questions
Is btrfsck --repair safe to run on a failing drive?
Is Synology Btrfs the same as native Linux Btrfs?
Is Btrfs RAID 5 stable enough for production use?
Where are Btrfs superblocks stored on disk?
What is the difference between btrfs restore and mounting the filesystem?
Can deleted Btrfs files be recovered from an SSD?
Related Recovery Services
Synology, QNAP, TrueNAS, and other NAS platforms
Full RAID recovery for all levels and controllers
Enterprise server and hypervisor recovery
FAULTED pools, TXG rollbacks, raidz failures
VGDA ring buffer recovery and thin-pool repair
DSM volume crash and SHR failures
Linux software RAID superblock recovery
Transparent cost breakdown for all services
Btrfs volume corrupted or unmountable?
Free evaluation. Write-blocked drive imaging. B-tree root scanning and read-only data extraction via btrfs restore. No data, no fee.