Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 5 Months, 20 DaysFacility Status: Fully Operational & Accepting New Cases
Rossmann Repair Group logo - data recovery and MacBook repair

What Happens During a NAS Crash

Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Published March 8, 2026
Updated March 8, 2026

A NAS (Network Attached Storage) crash can involve failure at multiple layers: individual drives, the RAID array, the filesystem, or the NAS controller board itself. Consumer and small-business NAS devices from Synology, QNAP, Buffalo, TerraMaster, and Asustor run Linux-based operating systems (DSM, QTS, TerraMaster OS) with software RAID managed by mdadm or Btrfs. When a NAS becomes inaccessible, the specific failure mode determines what data is recoverable and how.

A NAS crash is a stack of layered failures: drive surface errors, mdadm superblock divergence, ext4 or Btrfs metadata corruption, and controller board death. Pulling drives and plugging them into Windows or running fsck on individual members destroys the RAID metadata that holds the array together. Power the NAS off, image every surviving drive with ddrescue, and reassemble from images. For live recovery, see our NAS recovery service.

Three Categories of NAS Failure

Failure TypeWhat HappenedNAS BehaviorData Status
Single disk failure (with redundancy)One drive died in a RAID 1/5/6/SHR arrayDegraded mode; beeping; warning in management UIAll data accessible; rebuild with replacement drive
Multi-disk failure or RAID collapseTwo+ drives failed (RAID 5/SHR-1), or any failure in RAID 0/JBODVolume inaccessible; NAS may boot but shows no storage poolData present on drives but array cannot be assembled
NAS controller/board failureNAS hardware (CPU, RAM, flash module, power supply) failedNAS does not power on, or boot loopsDrives are fine; data recoverable by assembling RAID externally

Filesystem Corruption on Multi-Disk Arrays

Consumer NAS devices typically use ext4 or Btrfs as the filesystem on top of the RAID volume. Both filesystems use journaling (ext4) or copy-on-write (Btrfs) to maintain consistency during writes. However, these protections have limits.

Power loss during active writes can corrupt the filesystem if the NAS does not have a UPS and the drives do not have reliable power-loss data protection. Most consumer NAS devices do not have battery-backed cache. The drives' own write cache may report writes as complete before they are physically committed to the platters, creating a window where data can be lost.

Synology SHR (Synology Hybrid RAID) and QNAP's RAID configurations are built on Linux mdadm. The RAID superblock metadata (drive order, chunk size, parity layout, array UUID) is stored on each member drive. If the superblocks become inconsistent (due to a drive being temporarily disconnected and then reinserted after the array has changed), mdadm may refuse to assemble the array or may assemble it with incorrect parameters.

What Is RAID Metadata and How Does It Get Damaged?

NAS RAID metadata is stored in the mdadm superblock on each member drive. It records the array UUID, drive positions, RAID level, chunk size, and event counter. Mismatched event counters cause mdadm to exclude drives it considers stale, even when those drives hold valid data and the array could otherwise assemble.

NAS RAID metadata is stored in the mdadm superblock (version 1.0, 1.1, or 1.2, depending on NAS vendor and firmware version). This superblock records:

  • Array UUID (unique identifier for the array)
  • Member drive positions and UUIDs
  • RAID level, chunk size, and layout algorithm
  • Array state (clean, active, degraded, rebuilding)
  • Event counter (incremented on every state change)
  • Bitmap location (for write-intent bitmaps)

If drives are removed and reinserted in a different order, or if drives from different arrays are mixed, the superblock event counters may not match. mdadm uses the event counter to determine which drives have the most recent data. Mismatched counters can cause mdadm to exclude a drive it considers stale, even if that drive has valid data.

Some NAS firmware updates modify the RAID configuration or partition layout. If a firmware update is interrupted (power loss, network failure), the RAID metadata may be left in a transitional state that the NAS cannot resolve on its own.

RAID Parity Rebuild Cascade and URE Probability

A RAID 5 or SHR-1 rebuild reads every sector on all surviving member drives to recompute missing blocks from XOR parity. Consumer SATA drives are rated at one unrecoverable read error per 10^14 bits read (roughly 12.5 TB). One URE during rebuild causes mdadm to abort and mark a second drive failed, collapsing the array.

When a RAID 5 or Synology SHR-1 array loses a drive, the replacement drive is populated by reading every corresponding sector from the surviving members and recomputing the missing block from XOR parity. Every sector on every remaining drive must be read successfully for the rebuild to complete. Consumer SATA drives are specified by their manufacturers at an unrecoverable read error (URE) rate of one sector per 10^14 bits read, which works out to roughly one URE per 12.5 TB of sequential reads. Enterprise SATA and nearline SAS drives are typically rated at 10^15 bits per URE.

In a four-drive RAID 5 built from 8 TB consumer drives, a rebuild reads approximately 24 TB from the three surviving members. That read volume sits at the high end of the published URE budget. A single read error during rebuild causes mdadm to mark the failing drive as faulty and abort, leaving the array in a doubly-degraded state it cannot recover from without the original failed drive back online. This is why large consumer RAID 5 arrays frequently fail during rebuild even when every surviving drive reported clean SMART data beforehand. See mdadm superblock recovery for what happens after the rebuild aborts.

The cascade compounds with drive age. Drives purchased together from the same batch share manufacturing lot characteristics and wear together. When one drive accumulates reallocated sectors past the SMART threshold, others from the same batch are statistically likely to be near threshold as well. Replacing a failed drive with a new unit drawn from the same shipment introduces a drive whose firmware revision may differ from the surviving members, which can expose controller-level incompatibilities during the sustained read load of a rebuild.

Two practical consequences for IT administrators managing RAID arrays with irreplaceable data: do not allow a degraded array to begin rebuild before imaging every surviving member, and do not deploy RAID 5 on arrays whose total read volume during rebuild exceeds the URE budget of the installed drives. RAID 6 or SHR-2 tolerates a second URE mid-rebuild and is the minimum parity level for arrays above roughly 12 TB of usable capacity on consumer drives.

How Does Filesystem Metadata Corruption Propagate?

Filesystem corruption on NAS arrays follows a predictable sequence regardless of filesystem type. The trigger is almost always an interruption of an in-flight metadata write: power loss during a directory operation, a disk dropping from the array mid-transaction, or a kernel panic in the NAS operating system. What happens next depends on the filesystem.

On ext4, the journal (typically 128 MB at the start of the partition) records metadata changes before they are committed to their final locations. On next mount, the kernel replays the journal to bring the filesystem into a consistent state. If the journal itself is corrupted or the journal superblock references a missing transaction, e2fsck -y is required. e2fsck may discard partially-committed transactions, losing recently created or deleted files, but the filesystem remains mountable. Silent corruption of on-disk inode tables outside the journal scope is not detected by ext4 because ext4 does not checksum data blocks by default.

On XFS (used by some QNAP configurations and enterprise NAS builds), the equivalent mechanism is the XFS log. xfs_repair replays the log on mount and rebuilds allocation group metadata from B+tree traversal. XFS does checksum metadata blocks on XFS v5 (default since 2014), so corruption is detected rather than silent. However, an XFS log that references deleted or relocated extents can leave the filesystem in a state xfs_repair refuses to clear without -L, which discards the log and risks data loss.

On Btrfs, the equivalent structures are the chunk tree, root tree, extent tree, and checksum tree. Btrfs writes new metadata to new locations (copy-on-write) and updates the superblock to point at the new tree roots atomically. If the superblock update is interrupted, Btrfs can fall back to one of its backup superblocks (four copies stored at fixed offsets: 64 KiB, 64 MiB, 256 GiB, and 1 PiB). If all four superblocks point at a chunk tree that references a missing device ID (common after a Btrfs RAID member has been removed or reformatted), the filesystem becomes unmountable and btrfs-restore must walk raw metadata blocks to extract files without mounting.

The corruption sequence on a crashed Synology NAS or QNAP NAS typically propagates upward through the layers: a drive develops reallocated sectors, mdadm kicks the drive from the array, the array enters degraded mode, a second event either takes another drive offline or interrupts a metadata write, the filesystem becomes inconsistent, and the NAS management UI reports the volume as crashed. Every layer below the one that actually failed is intact. Recovery is about reassembling the lowest healthy layer and walking back up.

Why Pulling Drives and Running Consumer Tools Makes It Worse

Pulling NAS drives and connecting them to a PC destroys the RAID context the array depends on. Windows cannot read ext4, Btrfs, or XFS filesystems and may prompt to initialize the disk, overwriting RAID metadata. Consumer recovery tools scan individual drives and cannot reconstruct files that span multiple stripes across multiple drives.

When a NAS fails, a common reaction is to pull the drives and connect them to a PC. This creates several problems:

  1. Windows cannot read the filesystem. NAS volumes use ext4, Btrfs, or XFS. Windows does not natively mount these. Windows Disk Management may prompt to "initialize" the disk, which would overwrite the partition table and RAID metadata.
  2. Linux may auto-mount or fsck. Connecting drives to a Linux system may trigger automatic filesystem checks (fsck). If fsck runs on a RAID member drive individually (outside the RAID context), it can modify the filesystem metadata in ways that corrupt the RAID volume.
  3. Drive order is lost. The physical slot position of each drive in the NAS determines its role in the RAID array. If drives are removed without labeling their slot positions, reinserting them in the wrong order can cause the NAS to fail to assemble the array or to assemble it incorrectly.
  4. Consumer recovery software scans individual drives. Tools like Recuva, Disk Drill, or PhotoRec scan a single drive at a time. They cannot assemble a RAID array. Running a scan on an individual RAID member drive will find fragments of files but cannot reconstruct complete files that span multiple stripes across multiple drives.

Do not initialize, format, or resync when the NAS prompts you to.

When a NAS detects a problem with its storage pool, it may offer to "repair," "resync," or "reinitialize" the volume. Reinitialization creates a new empty RAID array, overwriting the existing metadata and making recovery far more difficult. If the data matters, power off the NAS and consult a recovery service before accepting any repair prompts from the NAS management interface.

NAS Controller Board Failures

When NAS hardware fails (CPU, RAM, power supply, internal flash module), the drives are typically unaffected. Consumer NAS devices use software RAID (mdadm), so all RAID metadata is stored on the data drives, not on the controller. The drives can be read by any Linux system running mdadm, provided the correct assembly parameters are used.

Recovery from a NAS controller failure involves:

  1. Removing all drives and labeling their slot positions
  2. Imaging each drive individually using a hardware imager or ddrescue
  3. Scanning images for mdadm superblocks to determine array parameters
  4. Assembling the RAID array from images (not original drives)
  5. Mounting the filesystem read-only and copying data

Alternatively, installing the drives in an identical NAS model (same manufacturer, same model, same or newer firmware) and selecting "Migrate" instead of "Install" during setup may allow the new NAS to recognize the existing array. This is not guaranteed and depends on the NAS manufacturer's migration support.

Btrfs-Specific Failure Modes

Synology DSM 7+ defaults to Btrfs for its volumes. Btrfs provides checksumming and copy-on-write at the filesystem level (similar to ZFS). However, Btrfs RAID 5/6 has a known "write hole" issue that the Btrfs developers have documented: if power is lost during a write that spans a parity stripe, the parity may become inconsistent with the data. Synology mitigates this by using mdadm for RAID below Btrfs (SHR uses mdadm RAID + Btrfs filesystem on top), but QNAP and some custom NAS builds may use Btrfs-native RAID.

Btrfs maintains extensive internal metadata: the chunk tree, extent tree, device tree, and checksum tree. Corruption of the chunk tree (which maps logical addresses to physical device locations) can make the entire filesystem unmountable. Recovery from chunk tree corruption requires parsing raw Btrfs structures on the disk images to reconstruct the mapping.

Frequently Asked Questions

Should I pull the drives from my NAS and connect them to a PC?

No. NAS drives use Linux filesystems (ext4, Btrfs) and software RAID. Windows cannot read them and may prompt to initialize the disk, overwriting metadata. Linux may auto-run filesystem checks that corrupt the RAID volume. If you must remove drives, label their slot positions and do not connect them to a system that might auto-mount or modify them.

Can Synology or QNAP support recover my data?

NAS manufacturers provide troubleshooting guidance but do not perform data recovery. If the array has lost more drives than its redundancy allows, or if the filesystem is corrupted beyond self-repair, the manufacturer will advise contacting a recovery service. The manufacturer's priority is restoring the device, which may involve reinitializing the array.

Why does a RAID 5 rebuild often fail partway through after a single disk is replaced?

During a rebuild, every sector on every surviving member drive must be read successfully to reconstruct the replacement drive from parity. Consumer SATA drives are specified at an unrecoverable read error rate of one sector per 10^14 bits read (about 12.5 TB), per manufacturer datasheets. A 4x 8 TB RAID 5 rebuild reads roughly 24 TB from the surviving three members, which sits near the URE threshold. A single URE on one surviving member causes mdadm to mark that drive failed and abort the rebuild. Enterprise drives at 10^15 bits per URE reduce but do not eliminate this risk.

What is the difference between ext4 journal corruption and Btrfs chunk tree corruption?

ext4 is a journaled filesystem: metadata writes are first written to a circular journal, then replayed onto the main filesystem. Journal corruption usually clears on next mount via e2fsck, at the cost of discarding partially-committed transactions. Btrfs is copy-on-write and maintains independent chunk, root, extent, and checksum trees. The chunk tree maps logical addresses to physical device offsets. If all on-disk copies are corrupted or reference missing device IDs, the filesystem becomes unmountable and btrfs-restore must parse raw metadata blocks to extract files.

If my NAS shows a degraded array, should I start a rebuild immediately or image the drives first?

Image every surviving member drive before allowing the NAS to rebuild. A rebuild is a write-heavy operation that stresses all surviving drives simultaneously; drives quietly accumulating reallocated sectors can fail under that load. If a second drive fails during rebuild on RAID 5 or SHR-1, the array collapses and recovery from drives partially overwritten by the rebuild is harder than recovery from clean pre-rebuild images. Power off the NAS, image each drive with a hardware imager or ddrescue, and assemble the RAID from images rather than originals.

If you are experiencing this issue, learn about our NAS recovery service.