Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 6 Months, 30 DaysFacility Status: Fully Operational & Accepting New Cases

mdadm Recovery: Linux Software RAID Data Recovery

mdadm is the Linux software RAID layer underneath almost every consumer NAS. Recovery means imaging every member read-only first, capturing each drive's superblock with mdadm --examine, then assembling the array from loop images with mdadm --assemble --readonly. We never run mdadm --create on your drives, because it writes fresh superblocks and can shift the data offset enough to destroy the filesystem. All work is performed in-house at our Austin, TX lab.
Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated 2026-05-31

If your Synology, QNAP, Netgear, TerraMaster, or Asustor NAS will not mount its volume, the array is almost certainly Linux mdadm software RAID, usually with LVM and an ext4 or Btrfs filesystem stacked on top. That is good news for recovery: the data is not locked inside the enclosure. Send us the drives and the work happens at our Austin, TX lab, where we image each member and reconstruct the array on a forensic workstation. No data, no recovery fee.

Your data is not trapped in the NAS enclosure

mdadm (the Linux Multiple Device administrator) is the software RAID engine built into the Linux kernel. The major NAS vendors ship it under their own branding. Synology Hybrid RAID is the clearest example: SHR is not a proprietary format. It is plain mdadm arrays, grouped under LVM, carrying an ext4 or Btrfs filesystem. Any of those arrays will assemble and mount on a vanilla Linux workstation once the member drives are imaged.

This is why the right move when a NAS dies is to stop, not to improvise. Do not migrate the drives into a replacement enclosure to "read them on the new unit." A new NAS routinely rewrites the system and swap partitions on every inserted disk, and it will frequently offer a "fresh install" that initializes the very members you are trying to recover. The metadata you need lives on those drives. Overwriting it is how a recoverable array becomes an unrecoverable one.

Why mdadm --create is the command that destroys arrays

The single most destructive piece of advice on IT forums is to run mdadm --create (often with --assume-clean) to force a broken array back together. It is repeated by forum contributors and even by vendor support scripts. It is also the leading cause of permanent logical data loss we see on intake.

Here is what actually happens. mdadm --create writes a fresh superblock to every member. If the chunk size, the parity layout, or the disk order is wrong, the striping math breaks and the stripes read out of sequence. If the metadata version is wrong, the calculated data_offset moves and the payload start shifts. The array will assemble and the volume will even appear, but the RAID layer is now reading the filesystem from the wrong byte offsets or in the wrong order. The output is interleaved garbage. If LVM was in use, the new superblock lands on top of the LVM physical-volume header and severs the link to the volume group. The moment someone runs fsck on that garbled volume, the repair tool "fixes" inodes it cannot parse by zeroing them, and the payload is gone.

Running --create sometimes works for a person who perfectly guesses their original geometry. It is the data-recovery equivalent of Russian roulette, and recommending it to someone with an inactive NAS is reckless. The safe path captures the existing metadata with mdadm --examine on read-only images and reconstructs the array offline, where a wrong guess costs nothing because the originals are never touched.

mdadm superblock versions and where the data starts

The superblock is the binary structure that tells the kernel how to read the array. The version dictates where it sits on the drive and where the data payload begins. Getting this wrong by even a few kilobytes is the difference between clean data and garbage, which is exactly why a blind --create is so dangerous.

VersionMetadata locationPayload startNotes
0.90End of deviceByte 0Legacy. 64 KiB-aligned block at the end of the member. Caps member size near 2 TB. Seen on older Synology and Debian builds.
1.0End of deviceByte 0Sits 8 to 12 KiB from the end. Because the payload starts at offset 0, a member can be auto-mounted as a standalone disk, which desynchronizes the array.
1.1Start of deviceAfter the superblockSuperblock at byte 0. Rare in production.
1.24 KiB from startdata_offsetThe modern default. Superblock at 4096 bytes. Payload begins after a calculated data_offset, commonly aligned to 1 MiB.

From each member's superblock we extract the array UUID, the chunk size and layout (for example left-symmetric on RAID 5), the data_offset, the member role, and the events counter. When the superblocks are intact, that is enough to assemble read-only. When they are gone, we reconstruct the geometry by hand.

Reconstructing geometry with ext4 hex carving

When superblocks are missing or were overwritten, the disk order and data_offset have to be proven from the filesystem itself. ext4 makes this tractable. An ext4 filesystem keeps its primary superblock at byte offset 0x400 (1024) from the start of the volume, and inside that superblock, at offset 0x38, sits the magic signature 0x53EF. So the magic number always lands exactly 0x438 bytes into the logical ext4 payload.

We search the raw images for 0x53EF. If it appears at 0x100438 on a given member, that is 1 MiB plus 0x438, which proves that member is the first drive in the array and that the data_offset was exactly 1 MiB. Locating fragments of the ext4 block group descriptor tables across the other members then confirms the chunk size and the full disk order before we ever attempt an assembly. None of the top-ranking DIY guides document this step, which is why their advice collapses the moment the metadata is actually gone.

Events counter mismatches and read-only forced assembly

Every superblock carries an events counter. While the array runs, that counter increments in lockstep across all members. When a drive drops, its counter freezes while the survivors keep climbing. After a reboot, mdadm may refuse to assemble because the counts disagree.

The fix is not to force everything together. We compare the events counts captured from mdadm --examine, identify the stale member (the lowest count), and assemble the array read-only from the current members while excluding the stale one, or use --force together with --readonly only when the spread is small and the images make the risk explicit. Pulling a stale drive back into the stripe set mixes old and current data and pollutes the parity, so the order of operations matters as much as the commands.

LVM sits on top of mdadm, and it leaves you a backup

On most NAS volumes, and on SHR specifically, LVM is layered over the mdadm arrays. The raw partitions are grouped into arrays, those arrays become LVM physical volumes, the physical volumes pool into a volume group, and a logical volume is carved out and formatted. Recovery has to walk that stack in order: assemble the array, then activate the volume group with vgchange -ay, then mount the logical volume read-only.

The LVM physical-volume header lives in the first 1 MiB of the array payload, and the volume-group metadata that follows it is an ASCII configuration block mapping physical extents to logical extents. Linux also keeps text backups of that metadata in /etc/lvm/archive and /etc/lvm/backup. When an errant mdadm command has wiped the on-disk LVM header, those archive files let us rebuild the mapping with vgcfgrestore instead of guessing it. We inspect the stack with pvs, vgs, and lvs against the images, never against the originals.

Why a RAID 5 rebuild can finish the job your failure started

Before any talk of rebuilding, the math has to be on the table. Consumer SATA drives carry an Unrecoverable Read Error rate near 1 error per 10^14 bits read, which works out to roughly 12.5 TB between errors. A RAID 5 rebuild has to read every sector of every surviving member to recompute the missing parity. On a degraded four-bay array of 16 TB drives, that is a 48 TB sequential read in one pass. At that error rate the array is statistically expected to hit several unreadable sectors during the rebuild.

Enterprise hardware controllers can puncture a stripe and keep going. Linux software RAID does not. It interprets a single URE during resync as a failed drive, drops that member, and aborts the rebuild, which on an already-degraded RAID 5 collapses the whole volume. This is why we image every member before any reconstruction. A rebuild is a write operation against failing media. Imaging is a read operation that preserves every option.

SMR drives get ejected in the middle of a rebuild

Drive-managed SMR (Shingled Magnetic Recording) disks introduced a failure mode that wrecks software RAID resyncs. SMR drives write into overlapping tracks and absorb random writes into a small conventional (CMR) cache. During a rebuild, the sustained sequential write floods that cache. When it overflows, the drive pauses host communication for many seconds while it flushes data into the shingled zones.

The Linux block layer default timeout (/sys/block/sdX/device/timeout) is usually 30 seconds. When the SMR drive stalls past that, the kernel assumes the disk is dead, issues a bus reset, and drops it from the array mid-rebuild. The drive is healthy. The array is now crashed. This is why an SMR member in a RAID set is a recovery problem waiting to happen, and why we image SMR drives with controlled retry thresholds rather than letting a controller hammer them.

The read-only recovery workflow we run

Every case follows the same order, and every step happens on copies, never on your drives:

  1. Power the array down so a degraded set stops retrying bad sectors and wearing heads.
  2. Attach a hardware write blocker and clone each member sector by sector with ddrescue or a PC-3000 Portable III. Drives with bad sectors or head degradation get bitwise imaging with controlled retry limits.
  3. Map each image to a loop device with losetup -r and capture mdadm --examine from every member to record UUID, chunk size, layout, data_offset, and events count.
  4. Reconstruct the geometry. If superblocks are intact, assemble with mdadm --assemble --readonly. If they are gone, carve the ext4 0x53EF signatures to prove disk order and offset, then assemble virtually.
  5. Walk the stack: vgchange -ay, then mount the logical volume read-only, using vgcfgrestore from the LVM archive if the on-disk metadata was damaged.
  6. Extract the files to fresh media and verify them against the directory tree.

What this costs

mdadm recovery is priced per member drive, on the same tiers as our single-drive HDD work, because each member has to be imaged individually. Healthy members that only need read-only imaging and assembly fall at the lower tiers (From $100). Members with bad sectors, head degradation, or seized motors move into the head-swap and platter-work tiers, up to $2,000 per drive. Helium drives, common in higher-capacity NAS builds, are handled in-house and priced on the $200–$5,000+ helium tiers. There is no diagnostic fee, and there is no recovery fee if we cannot get your data back.

Frequently asked questions

Can I run mdadm --create --assume-clean to recover my missing array?

No. mdadm --create writes fresh superblocks to your drives. If your disk order or chunk size is wrong, the striping math breaks and the stripes read out of sequence. If the metadata version is wrong (1.2 versus 1.0, for example), the data_offset shifts and the payload start moves. Either way the filesystem payload turns to garbage, and a later fsck on that garbled volume zeroes the inodes it cannot parse and makes the loss permanent.

Why did my RAID 5 rebuild fail and crash the volume?

A rebuild reads every sector of every surviving drive to recompute parity. Consumer drives carry a URE rate near 1 in 10^14 bits, about 12.5 TB. A multi-drive array reads far more than that in a single pass, so hitting a bad sector is close to certain, and software RAID treats that read error as a dead drive and aborts.

How do I safely reassemble a degraded mdadm array?

Never work on the originals. Clone each member with ddrescue, map the images with losetup -r, and assemble with mdadm --assemble --readonly. Read-only assembly stops the kernel from starting a resync or journaling writes over recoverable data.

Can I recover NAS data without the original NAS enclosure?

Yes. Synology, QNAP, Netgear, TerraMaster, and Asustor units run standard Linux mdadm with LVM. The array assembles on any Linux workstation from read-only loop images. You do not need the original enclosure, and you should not move the drives into a replacement chassis, because the new unit will often rewrite the system partitions or offer a fresh install.

What does an events counter mismatch mean in mdadm?

It means the members stopped receiving writes at different times. When a drive drops, its events counter freezes while the survivors keep climbing. The lowest count is the stale member. Forcing assembly without excluding it mixes old and current stripes and corrupts the parity math.

Related services

Need Recovery for Other Devices?

Ship us your drives. We'll reconstruct the array.

Read-only mdadm recovery on forensic images. No data, no recovery fee. Free diagnosis. Austin, TX lab.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews