Btrfs RAID Recovery & Filesystem Repair

What Btrfs is and why it fails differently
Btrfs stores data in a B-tree structure where every block pointer carries a checksum. The filesystem mirrors its superblock to up to four fixed locations: 0x10000 (64 KiB), 0x4000000 (64 MiB), 0x4000000000 (256 GiB), and 0x4000000000000 (1 PiB). Only those locations that fall within the device size are written, so a smaller device carries fewer than four. When a NAS reports a crashed Btrfs volume, one or more of these superblocks has a mismatched generation number, or a tree block fails its checksum.
Because Btrfs is Copy-on-Write, it never overwrites a block in place. A new version is written elsewhere and the parent pointer is updated. This is why a sudden power loss during a write can leave a tree block pointing to a location that was never fully committed. The result is a parent transid verify failed error or csum failed in dmesg. The filesystem will not mount because the tree is internally inconsistent.
Why btrfs check --repair destroys data
The btrfs-progs tool btrfs check --repair attempts to rebuild broken trees by scanning the entire device and rewriting metadata. On a CoW filesystem, rewriting metadata means allocating new blocks and updating pointers. The old blocks are not erased, but the generation tree roots that pointed to them are overwritten. If the repair logic guesses wrong, you lose the historical roots that might have contained intact data.
Every recovery lab that recommends btrfs check --repair as a first step is gambling with your only remaining copies of the metadata tree. We do not run it on original drives. We image the drive first, then run read-only analysis on the image.
How Btrfs generation trees preserve older metadata
Btrfs generation trees preserve older metadata because Copy-on-Write writes new tree blocks and advances a root pointer instead of overwriting the old tree in place. A crash can damage the newest transaction generation while earlier roots still describe intact extents, snapshots, and directory trees.
The useful recovery target is not always the newest root. We compare generation numbers from btrfs inspect-internal dump-super against candidate roots reported by btrfs-find-root, then test extraction from images only. If generation 812304 points into a torn write but 812289 still walks the directory tree, the older root is the safer source.
btrfs inspect-internal dump-super /mnt/images/member-array.img
btrfs-find-root /mnt/images/member-array.img
btrfs restore -t <root_bytenr> /mnt/images/member-array.img /recovery/output/How btrfs-find-root surfaces candidate roots
btrfs-find-root does not choose the recovery root by itself. It scans the imaged filesystem for root tree blocks and reports candidate bytenr, level, and generation values. The technician has to match those candidates against the superblock generation, chunk tree, and snapshot tree.
On Synology SHR and ReadyNAS OS6, that Btrfs layer usually sits above mdadm. We assemble the member images read-only first, then run Btrfs tools against the virtual array image. Running mdadm --create or btrfs check --repair before this step rewrites metadata that the root search depends on.
- Bytenr
- The byte address of a candidate tree root inside the Btrfs address space.
btrfs restore -tuses this value to walk that root without modifying the source image. - Generation
- The transaction number attached to a tree block. A lower generation can be more useful when the newest generation was interrupted by power loss or a dropped NAS member drive.
- Level
- The tree depth of the candidate block. A valid root must have a level that matches the expected B-tree structure for the root tree being restored.
Why mounting a failing Btrfs drive is not safe
The first mistake users make is mounting a failing Btrfs drive read-write to look at the files. Any read-write mount triggers log-tree replay, which writes updated pointers to disk. If the log references corrupted blocks, replay can push bad pointers into the live filesystem structure and turn a recoverable volume into an unmountable one.
The older mount -o recovery option (renamed to usebackuproot in newer kernels) goes further. It tells Btrfs to fall back to backup tree roots and rewrite the active root pointer if the newest root is unreadable. That is an in-place modification of the Copy-on-Write history we depend on for recovery, even when paired with ro.
For data recovery, the safe sequence is to image the device first with ddrescue or the PC-3000 Portable III, then inspect the image with read-only tools such as btrfs inspect-internal dump-super and btrfs-find-root. If mounting an image is necessary, use ro,nologreplay on a copy, never the original drive.
Read-only forensic toolchain
Our workflow uses four read-only or image-based tools in sequence. We run these on copies of your drives, never the originals.
- Imaging: Every drive is imaged with
ddrescueor the PC-3000 Portable III before any filesystem access. A 4TB drive takes 6 to 10 hours at full speed. - Superblock inspection:
btrfs inspect-internal dump-super /dev/imagereads all four superblock copies and reports generation, total bytes, and root tree addresses. - Root search:
btrfs-find-root /dev/imagescans for historical tree roots with valid generation numbers. A Btrfs filesystem may contain dozens of older roots from before the corruption event. - Data extraction:
btrfs restore -t <bytenr> /dev/image /output/extracts files from a specific historical root. This is entirely read-only on the source image.
Btrfs RAID profiles and the write hole
Btrfs supports RAID 0, 1, 10, 5, and 6 natively. RAID 1 and 10 are stable and widely used. RAID 5 and RAID 6 inside Btrfs are not production-ready. The Btrfs RAID 5 implementation has a known write hole: if the system crashes during a parity update, the stripe is left partially updated. A subsequent scrub or read may use the wrong parity block to reconstruct data, silently corrupting it.
Synology, QNAP, and TrueNAS do not use Btrfs native RAID 5 for their primary storage pools. Synology SHR uses mdadm RAID 1/5/6 underneath Btrfs. QNAP and TrueNAS use mdadm or ZFS for the RAID layer and format the resulting volume with Btrfs or another filesystem. If you encounter a Btrfs RAID 5 array, it is likely a custom Linux installation, and the write hole is a real risk.
Consumer drives carry a worst-case specification of one unrecoverable read error per approximately 10^14 bits read, about 12.5TB. That figure is a warranty floor, not a schedule, so a 48TB rebuild does not guarantee a read error; it makes the probability of encountering at least one unrecoverable sector high. If Btrfs RAID 5 hits a URE during a degraded scrub it cannot reconstruct that stripe, and the data in it is lost.
Because the Btrfs RAID 5 write hole makes degraded rebuilds risky, when the data is irreplaceable and unbacked we image every member read-only before any rebuild and reconstruct from the images.
Which Btrfs B-tree failed and what it means for recovery
Btrfs is not one flat structure. It stores its metadata in a stack of separate B-trees, and the tree that failed decides whether your data comes back read-only or whether the whole array needs raw destriping. Most labs treat a Btrfs crash as one undifferentiated event. It is not.
Damage to the file system tree is localized and usually recoverable by walking an older Copy-on-Write generation. Damage to the chunk tree or extent tree severs the logical-to-physical map for every drive in the pool at once.
- Root Tree (tree of tree roots)
- Holds the root nodes for every other B-tree (extent, chunk, FS, checksum). Failure shows as
failed to read tree root,open_ctree failed, orcorrupt node: root=1. Because of CoW, older root tree nodes from earlier transactions usually survive, sobtrfs-find-rootcan scan the raw image for an older generation and route around the damage read-only, no destriping needed. - Chunk Tree
- Maps logical Btrfs addresses to physical device byte offsets, and tracks which physical drive holds each chunk in a multi-device array. Failure shows as
read_block_for_search: logical addr mirror 1 failed,scan chunk headers error, ortype mismatch with chunk. Damage here destroys the logical-to-physical map.btrfs rescue chunk-recoverexists but can falsely abort on valid extent metadata, so severe damage forces full raw destriping. - Extent Tree
- Tracks allocated space and reference counts for all data and metadata blocks. Failure shows as
bad extent type mismatch with chunkorCouldn't setup extent tree, and the filesystem drops to read-only.btrfs check --init-extent-treeis dangerous; recovery means read-only extraction withbtrfs restoreor deep lab reconstruction. - Checksum (csum) Tree
- Stores the checksums (CRC32C by default) Btrfs uses to detect bit rot. If the tree itself is corrupt, Btrfs rejects perfectly valid data on a checksum failure. Recovery bypasses validation to pull the raw file blocks out.
- FS Tree / subvolume trees
- The actual file and directory hierarchy: inodes,
INODE_ITEMrecords, directory entries, andEXTENT_DATAreferences. Failure shows ascorrupt leaf: invalid extent data backreforread time tree block corruption detected.btrfs restorecan target an older FS tree root objectid and walk the directory tree cleanly without touching the array. - Log Tree
- The journal for
fsyncoperations. A damaged log tree blocks mounting. It can be cleared withbtrfs rescue zero-log, which writes to disk and therefore only runs on a clone. - Device Tree
- Holds the physical device mapping. Failure shows as
Couldn't setup device tree. Severe damage stops the array from assembling at all and forces raw destriping. - Free-Space Tree
- Caches free-space tracking. Failure shows as extent buffer leaks or mount failures, and is cleared with
btrfs rescue clear-space-cacheon a clone.
btrfs check --readonly and btrfs rescue: what is safe on the original
btrfs check --readonly is the default, non-destructive diagnostic. It sits at the very start of a read-only workflow, before any btrfs restore attempt, and it does not modify the device. It reports on root items, extents, free-space caches, csum items, and root refs, printing the exact errors above (for example bad extent type mismatch with chunk). That output tells the technician which tree failed before anything writes a single byte.
The trap is escalating it to btrfs check --repair as a reflex. Repair can scramble block allocations and should never be the routine next step after a read-only check. The check tells you what is wrong; it is not a license to let the same tool guess at fixing it.
The btrfs rescue subcommands all write to disk, so they only ever run on forensic clones or images, never original client media:
btrfs rescue super-recoveroverwrites corrupted superblocks with valid backup copies found at the fixed offsets.btrfs rescue chunk-recoverscans for chunk headers to rebuild the chunk tree.btrfs rescue zero-logclears the log tree so the volume can mount.
Each of these is a write operation. Running one on the only copy of your data turns a recoverable failure into a permanent one, which is why we clone first and run them against the clone.
When partial recovery works and when the array needs full destriping
Whether your files come back from an older generation root or whether the array has to be raw-destriped comes down to which trees survived. The split is clean once you know which tree failed.
Partial CoW recovery is possible when the extent tree and chunk tree are intact but the root tree or FS tree is damaged. Because Btrfs writes new blocks instead of overwriting old ones, an older valid generation root can still walk the FS tree. btrfs-find-root locates those older generations and btrfs restore extracts the files read-only, without repairing the array at all.
Full re-image plus raw destriping is forced when the chunk tree or extent tree is corrupted across members. The chunk tree owns the logical-to-physical translation layer; once that map is gone, the RAID members can no longer locate their own data blocks.
Btrfs native RAID 5 and 6 make this worse, because the write hole leaves stripes inconsistent and a damaged chunk allocation tree means the logical-to-physical mapping across member images is lost. At that point recovery means deep forensic destriping: imaging each member with the PC-3000 Portable III, then using Data Extractor Express RAID Edition to identify the physical parity blocks and stripe size manually, bypassing Btrfs native logic.
On Synology SHR this stays simpler than the marketing labs claim, because SHR is just mdadm plus LVM plus Btrfs: we clone the members first, assemble the underlying blocks, then begin Btrfs recovery on the assembled image.
Subvolumes and snapshots
Btrfs subvolumes are independent filesystem namespaces within the same partition. A snapshot is a read-only or writable subvolume that shares data blocks with its parent through the CoW mechanism. When a Btrfs filesystem is corrupted, the snapshot tree is often still intact even if the default subvolume is not.
Recovery means identifying which subvolume IDs are still reachable from a valid root. The btrfs subvolume list -t command on a mounted image shows the subvolume tree. If the default subvolume is damaged, we can extract data from an older snapshot by pointing btrfs restore at the snapshot's root bytenr.
Layered failures: Btrfs over mdadm
Most prosumer NAS devices run Btrfs on top of a software RAID layer. Synology SHR uses mdadm plus LVM plus Btrfs. Netgear ReadyNAS OS6 uses mdadm plus Btrfs directly. In these setups, the Btrfs corruption is often a symptom of an underlying RAID problem, not the root cause.
A degraded mdadm RAID 5 array that is missing one drive will still assemble in read-only mode with mdadm --assemble --readonly --force. Btrfs can then be mounted on the assembled array with mount -o ro,degraded.
If the array was accidentally rebuilt (a drive was removed and reinserted, and the NAS started a rebuild), the parity is wrong and Btrfs checksums will fail across the entire stripe. In that case, we reassemble the pre-rebuild mdadm geometry from drive images and extract Btrfs from the original state.
Common Btrfs error messages
- parent transid verify failed
- A child tree block has a generation number higher than its parent expects. This usually means a torn write or a drive that acknowledged a write it did not complete. The filesystem will not mount.
- csum failed
- A data block or metadata block failed its checksum. In a RAID 1 or 10 setup, Btrfs can read the mirror copy. In a single-drive or RAID 0 setup, the block is lost unless a historical snapshot contains an older version.
- open_ctree failed
- The kernel could not open the root tree. This is a catch-all error that appears when superblock inspection or root tree traversal fails. The underlying cause is usually a
parent transid verify failedat the root tree level. - block group has wrong amount of free space
- The block group's accounting metadata does not match the actual free blocks. This happens after an unclean shutdown on a nearly full filesystem. Btrfs refuses to mount because it cannot guarantee allocation safety.
Pricing
Btrfs recovery is priced per drive, multiplied by the number of drives requiring imaging and analysis. Standard consumer NAS drives use our HDD pricing tiers:
- File system recovery (logical): From $250
- Firmware repair (unrecognized / wrong size): $600–$900
- Head swap (clicking / beeping): $1,200–$1,500
Helium-filled enterprise drives (8TB and larger Toshiba MG, WD Ultrastar, Seagate Exos series) use helium-specific pricing: From $200 through $3,000–$4,500. A 5-bay Synology with four standard drives and one helium drive would be priced as the sum of the applicable per-drive tiers.
Rush service adds 100. +$100 rush fee to move to the front of the queue Donor drives are matching drives used for parts. Typical donor cost: $50–$150 for common drives, $200–$400 for rare or high-capacity models. We source the cheapest compatible donor available.
No diagnostic fees. No data, no recovery fee. If we cannot extract your files, you pay nothing for the recovery attempt.
How we recover it
Our lab is at 2410 San Antonio Street, Austin, TX 78705. Nationwide service is mail-in. We do not have satellite locations or franchise partners.
- Intake & imaging: Every drive is forensically imaged with ddrescue or the PC-3000 Portable III. We do not touch the original drives with repair tools.
- RAID reassembly (if applicable): For mdadm-based arrays, we assemble the RAID on a Linux workstation using the original drive order and mdadm superblocks. For Btrfs native RAID, we map the chunk allocation tree to determine which drives hold which stripes.
- Metadata analysis: We inspect the superblock copies and run btrfs-find-root to identify valid generation trees. If multiple historical roots exist, we test extraction from each to find the most complete dataset.
- Data extraction: Files are extracted with btrfs restore to a separate storage array. Extracted data is verified by checksum where possible.
- Return: Recovered data is returned on an external drive or via secure download. The original drives and images are retained for 30 days, then securely wiped.
All work is performed in-house. We use named equipment including the PC-3000 Portable III, PC-3000 Express, DeepSpar Disk Imager, and a 0.02 micron ULPA-filtered clean bench for mechanical work. Founded in 2008.
Frequently asked questions
Can I run btrfs check --repair on a corrupted Btrfs volume?
No. btrfs check --repair overwrites historical generation tree roots on a Copy-on-Write filesystem. This destroys older metadata versions that might contain intact data. The safe approach is read-only extraction with btrfs-find-root and btrfs restore.
Is mounting a failing Btrfs drive read-write safe?
No. Any read-write mount triggers log-tree replay, which writes to disk. On a degraded or corrupted Btrfs volume, any write risks updating pointers to bad blocks and making the filesystem unmountable. The recovery and usebackuproot options modify the backup root pointer even when combined with ro. Always image the drive first, then mount the image with ro,nologreplay if mounting is necessary.
What does parent transid verify failed mean?
It means a Btrfs tree block has a generation number that does not match its parent's expected transaction ID. This indicates a torn write, an interrupted scrub, or a drive that dropped writes. The filesystem will not mount until a valid root is found with btrfs-find-root.
Does Btrfs RAID 5 have a write hole?
Yes. Btrfs RAID 5 and RAID 6 are not production-ready and contain a known write hole. If a crash occurs during a parity update, the stripe becomes inconsistent. A subsequent scrub may corrupt data rather than repair it. Synology and most NAS vendors avoid Btrfs RAID 5 for this reason.
How do I recover a Synology SHR volume with Btrfs?
Synology SHR is standard Linux mdadm plus LVM plus Btrfs. It can be assembled on any Linux workstation with mdadm --assemble --readonly. The Btrfs layer is then accessible with standard btrfs tools. No proprietary hardware is required. If a recovery lab tells you SHR is a black box only their proprietary tool can read, walk away.
What is the safe way to extract data from a corrupted Btrfs filesystem?
Image every drive first with ddrescue or PC-3000 Portable III. Then run btrfs inspect-internal dump-super to inspect metadata, btrfs-find-root to locate a valid generation tree, and btrfs restore -t to extract files. Never run btrfs check --repair, mount the original drive read-write, or use the recovery or usebackuproot options.
Why do older Btrfs generation roots matter during recovery?
Older Btrfs generation roots matter because Copy-on-Write writes new tree blocks instead of overwriting old ones. If the newest root points into a torn write, an earlier generation can still describe intact extents, snapshots, and directory trees.
How does btrfs-find-root help without repairing the filesystem?
btrfs-find-root scans the imaged device for candidate root tree blocks and reports their bytenr and generation values. It does not repair the filesystem. The recovery step is choosing a readable root, then extracting files with btrfs restore -t to separate storage.
Can I just move the drives to a new NAS enclosure?
Only if the new enclosure uses the exact same RAID metadata format and drive order. Most NAS enclosures write their own configuration to the trailing sectors of each drive during initialization. Inserting your old drives into a new NAS often triggers an initialization that overwrites the mdadm or Btrfs superblocks, making recovery harder. Image the drives first, then experiment.
Why did my Btrfs RAID 1 volume crash if RAID 1 is supposed to be safe?
RAID 1 provides availability, not data protection. If both drives in a 2-drive RAID 1 have a corrupted Btrfs tree at the same logical offset, the filesystem has no good mirror to read. This can happen after a firmware bug, a simultaneous power event, or a bad RAM module that wrote incorrect data to both drives through the controller. The RAID layer did not fail; the filesystem layer above it did.
Are Btrfs snapshots a backup?
Snapshots are not a backup if they live on the same physical pool as the original data. A snapshot protects against accidental deletion or ransomware, but it does not protect against drive failure, controller corruption, or a fire in the server closet. A backup is a separate copy on separate hardware.
Do I need to send all drives if my NAS has a hot spare?
Send every drive that was part of the array at the time of failure, including the hot spare if it was ever activated. The Btrfs chunk tree and RAID geometry metadata are spread across all member drives. Missing one drive means missing part of the metadata tree, which can make the entire array unrecoverable.
How long does Btrfs recovery take?
Imaging takes 6 to 10 hours per 4TB drive. A 4-drive NAS typically takes 1 to 3 business days for analysis and extraction, assuming no mechanical failures. Drives with bad sectors or head degradation require additional time for bitwise imaging with the PC-3000.
Why does my Btrfs filesystem report open_ctree failed?
open_ctree failed points to corruption in the Btrfs root tree or superblock. Because Btrfs uses Copy-on-Write, older versions of the root tree usually still exist on the drive. A lab can locate a previous valid generation with btrfs-find-root and perform a read-only extraction without risking further damage to the original media.
Can btrfs check --repair fix a damaged extent tree?
No. btrfs check --repair should never be the routine fix. When the extent tree is damaged, repair blindly guesses block allocations, which often permanently destroys directory structures. We image the drives first and use read-only extraction instead of destructive in-place repair.
What happens if the Btrfs chunk tree is corrupted on a RAID array?
The chunk tree maps logical data to physical locations across your RAID members. If it is destroyed, the filesystem loses track of where your files live on the physical disks. Partial recovery tools fail at that point, and the array needs full raw destriping and manual reconstruction with lab equipment like Data Extractor Express RAID Edition.
Related services
Need Recovery for Other Devices?
Ship us your drives. We'll extract the data.
Btrfs recovery with read-only forensic tools. No data, no recovery fee. Free diagnosis. Austin, TX lab.