My Synology SHR volume crashed. What is SHR, and what do I do first?
SHR is not proprietary hardware. It is a software stack: mdadm for the RAID geometry, LVM to join the size bands, and Btrfs or ext4 on top. It reassembles on any Linux workstation. Power the NAS down, do not click Repair, and do not move the drives to a new Synology. Recovery means imaging every member, then rebuilding the mdadm and LVM layers offline. All work happens at our Austin, TX lab. Free evaluation, no data = no charge.
Synology SHR Hybrid RAID Data Recovery
Your Synology shows a red Volume Crashed banner, or an SHR array will not assemble after a rebuild. Before you touch anything, power the unit down. Synology Hybrid RAID runs the same Linux software RAID that has been around since 2001, so it reconstructs on a Linux workstation as part of our Synology NAS data recovery work, with no Synology chassis required. We mail you nothing to install and you ship us your drives: every member is cloned through a write-blocker before any analysis begins, and all work happens at our Austin, TX lab. Free evaluation, no data = no charge.

What Is Synology SHR Actually Made Of?
SHR is a nested stack of standard Linux layers with a thin Synology management overlay on top. There is no proprietary RAID silicon inside a DiskStation. Knowing which layer failed decides whether your case is a routine read-only reassembly or a hand reconstruction, and none of these layers is a black box.
- 1. Physical disks and partitions
- Every member is partitioned the same way: a small DSM system partition mirrored across all drives (
md0), a swap partition, and one or more data partitions. On a mixed-capacity array DSM cuts each drive into multiple partitions so the spare capacity of the larger drives can be used rather than wasted. - 2. mdadm software RAID
- The data partitions are aggregated by mdadm, the same Linux software RAID tool that has shipped in the kernel for two decades. Each drive carries an mdadm 1.2 superblock written 4096 bytes into the partition, which records the array UUID, the member order, the chunk size, and the RAID level. A healthy array assembles with
mdadm --assemble --readonly. - 3. LVM logical volume
- The Storage Pool is an LVM volume group, the Linux Logical Volume Manager that lets a NAS combine mismatched drive sizes. On a mixed-capacity SHR array LVM concatenates several mdadm arrays end to end into one continuous logical volume. It is activated on a workstation with
vgchange -ay. - 4. Btrfs or ext4 filesystem
- The filesystem is formatted on top of the LVM logical volume. Btrfs is the snapshotting filesystem most modern DiskStations use; ext4 is the older default. Because Btrfs never overwrites a block in place, it writes the new version elsewhere and updates a pointer, which is exactly why an in-place repair tool can destroy the older, still-valid versions of your data.
A standard equal-capacity SHR-1 of three or more drives is a single RAID 5 set under LVM and a Btrfs filesystem. A mixed-capacity SHR is several RAID sets stacked by LVM. Either way the recovery is the same shape: read the mdadm metadata, bridge with LVM, mount the filesystem read-only. The hard cases are the ones where the LVM bridge or the Btrfs tree is damaged, covered below.
What Is the Difference Between SHR-1 and SHR-2?
The number after SHR is how many drives can fail before the data is gone. SHR-1 survives one failure; SHR-2 survives two. The difference is which RAID level mdadm uses to build the size bands underneath, and that changes how much margin you have when a member is missing or unreadable during recovery.
| Attribute | SHR-1 | SHR-2 |
|---|---|---|
| Fault tolerance | One drive | Two drives |
| Minimum drives | One band can mirror on two drives | Four drives |
| Underlying mdadm level | RAID 5 on three or more drives, RAID 1 on a two-drive band | RAID 6 with dual parity across the bands |
| Recovery margin | One missing or unreadable member per band before a stripe is unreconstructable | Two missing or unreadable members per band before loss |
The practical consequence shows up during a crashed-array recovery. On SHR-1 a single unreadable sector on a single-parity stripe has no second parity to reconstruct it, so that stripe is lost. SHR-2 can lose a whole second member and still solve every stripe, which is why the dual-parity layout is worth the extra drive on anything that matters.
Why Are Mixed-Capacity SHR Arrays Harder to Reconstruct?
A mixed-capacity SHR volume is not one RAID array. It is several stacked on top of each other and joined by LVM, and each one has to be solved separately if the metadata is gone. This is the single thing that makes manual SHR reconstruction harder than a plain RAID 5, and it is the part most recovery pages never mention.
Take an SHR-1 of two 1TB drives and two 2TB drives. DSM carves a 1TB partition on all four drives and a second 1TB partition on the two larger drives. It builds the first band as a four-drive RAID 5 across the matched 1TB partitions, and the second band as a two-drive RAID 1 mirror across the leftover space on the 2TB drives.
LVM then takes both mdadm devices, marks them as physical volumes in one volume group, and concatenates them into a single logical volume. That is how SHR uses all 6TB of raw space to give you 4TB usable with one-drive fault tolerance instead of wasting the extra capacity the way classic RAID 5 would.
When the superblocks survive, all of this is automatic. When they do not and the geometry has to be carved by hand, the engineer solves disk order, chunk size, and offset once per band, not once for the volume.
If the LVM metadata that records the exact byte offset where the second band attaches to the end of the first is damaged, every band can be perfectly reconstructed and the Btrfs filesystem will still read as raw, unformatted data until that LVM bridge is repaired at the hex level. That repair is precise, irreplaceable work, which is why an LVM-layer failure is an in-lab job rather than something to attempt on the live NAS.
Why Did My SHR Volume Crash During a Rebuild?
The most common modern cause is an SMR drive timing out mid-rebuild. SMR drives are the ones that pretend to be standard hard drives but stall for 30 to 60 seconds when asked to do sustained writes, and the NAS reads that stall as a dead drive.
Western Digital submarined SMR into its WD Red consumer NAS line without disclosure, and SMR desktop drives from several vendors ended up in NAS arrays unknowingly, which set off a wave of these crashes.
The mechanism is the same every time. An SMR drive writes overlapping tracks, like shingles on a roof, so it cannot overwrite a block in place. It hides that penalty behind a small zone of conventional CMR space used as a fast write cache. Normal NAS traffic never fills that cache.
A rebuild does: it is a sustained, sequential, multi-terabyte write that runs flat out for hours. When the CMR cache overflows, the drive stalls to reorganize its shingles, its throughput drops to zero, and its SATA interface stops answering. The kernel waits out its Time-Limited Error Recovery window of roughly 7 to 30 seconds, decides the drive is dead, resets the SATA port, and drops the member from the array.
On a single-fault-tolerant SHR-1 that already lost one drive, that ejection is the second failure, and the pool collapses into the red Volume Crashed banner. The cruel part is that the ejected SMR drive is physically perfect. It only timed out on cache overflow.
The data is recoverable, but the array has to be imaged at a throttled pace with the timeout thresholds raised, then assembled offline from the clones. It cannot be fixed by clicking Repair again, which only re-triggers the same stall.
Why Did My SHR Volume Crash After an NVMe Cache Failure?
A read-write NVMe cache that drops off the PCIe bus takes the uncommitted writes it was holding down with it, and those writes were Btrfs filesystem metadata, so the HDD array stays healthy while the filesystem on top of it refuses to mount. Models with M.2 slots like the DS920+, DS1520+, and DS1621+ let you provision an NVMe SSD as cache, and the failure mode depends entirely on which mode you chose.
The whole risk turns on read-only versus read-write. They sound like a performance tuning choice and they are actually a data-safety choice, because only one of them sits in the write path:
- Read-only cache (safe failure)
- A read-only cache holds nothing but duplicate copies of frequently read data. Every block it caches already lives on the HDD array underneath. If the NVMe drive dies, it is holding no unique data and no uncommitted writes, so the primary volume stays intact and DSM simply drops the cache. This is the failure you want to have.
- Read-write cache (dangerous failure)
- A read-write cache sits in the write path. It intercepts new writes as a dirty cache and acknowledges them before they commit to the HDDs. When the NVMe drive drops off the PCIe bus under sustained write load or a firmware panic, the writes it had acknowledged but not yet flushed to the array are gone. The HDDs never received them, and the filesystem is now missing structure it believes it already wrote.
Here is the part that decides the recovery: the corruption lands on the Btrfs filesystem layer, not on the mdadm or LVM block layers underneath it. The mdadm arrays and the LVM volume group come through the dropout intact, because the data that vanished with the cache was Btrfs filesystem-level metadata and extent updates, not block-device parity.
Btrfs is copy-on-write and stamps every tree update with a monotonically increasing transaction id, its generation tracking. When a parent node on the HDDs points at a child block and expects a transaction id that was only ever written to the now-vanished cache, the generation check fails.
The kernel reports a parent transid verify failed condition and refuses to mount the filesystem rather than serve corrupt structure. That single failed check is the difference between a healthy block stack and a Volume Crashed banner.
Recovery runs on two tracks at once, the mechanical HDDs and the failed NVMe drive, because the surviving data and the lost data live on different media:
- HDD track: Image every mechanical HDD member through a hardware write-blocker with the PC-3000 Portable III, the PC-3000 Express, or a DeepSpar Disk Imager, assemble each size band with
mdadm --assemble --readonly, activate the LVM volume group, and mount the Btrfs filesystem read-only. Where the tree will not mount, we work read-only withbtrfs-find-rootandbtrfs restoreagainst historical generation roots that predate the aborted transaction. - NVMe track, in parallel: The failed NVMe drive is imaged separately through the same write-blocked forensic imaging workflow. We then attempt to parse the DSM dirty-cache extents off that image and apply them as a delta against the base HDD volume, putting the uncommitted writes back where the filesystem expects them. This track is what can close the transaction-id gap the HDD track alone cannot.
There is a hard limit, and it is honest to state it up front. If the NVMe drive is physically unrecoverable, from NAND degradation or a dead controller board, the dirty writes are gone and nothing can recreate them. The Btrfs extent tree keeps permanent structural gaps where those extents should have committed.
btrfs-find-root and btrfs restore can still traverse older trees and pull most files intact, but they cannot synthesize data that evaporated with the cache. What was acknowledged to the cache and never flushed is the part that is lost.
Three pieces of advice circulating on forums make this worse. The first is to pull the degraded cache to force the volume to mount: yanking a degraded read-write cache guarantees the uncommitted writes are lost and crashes the volume outright, so it is the one move to avoid, not the fix.
The second is the DSM Repair button, which on a cache-dropout crash tries to rebuild the arrays while the filesystem is corrupt and writes over the very structure the recovery needs. The third is btrfs check --repair, which aggressively deletes orphaned inodes to force the tree to mount and destroys recoverable data in the process. None of the three is safe on a crashed read-write cache volume.
Read-only forensic diagnostic (run against sector clones, never live drives)
# READ-ONLY DIAGNOSTIC. For sector-by-sector clones only,
# not live or degraded drives. The block stack is intact;
# the filesystem is the layer that will not mount.
# Block layers assemble cleanly from the HDD clones
mdadm --assemble --readonly /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
vgchange -ay
# A read-only mount fails with a generation mismatch when the
# cache that held the newest transaction never reached the HDDs:
# parent transid verify failed on <block> wanted <N> found <N-x>
# Traverse older generation roots read-only. No check --repair,
# no recovery mount framed as safe. These read; they never write.
btrfs-find-root /dev/vg1/volume_1
btrfs restore -t <older_root_bytenr> /dev/vg1/volume_1 /target/recoverThere is no btrfs check --repair here and no in-place repair. These commands read older trees; they never overwrite one.
Why Is Clicking Repair on a Degraded SHR Array Dangerous?
DSM presents Repair as a safe, automatic fix. On a degraded array it is a high-stress parity rebuild, and there are two ways it turns a recoverable situation into a lost one. The first is mechanical.
Repair forces every surviving drive to read every sector to recompute the missing parity. If a surviving drive already has weak heads or bad sectors, that sustained read load can finish it off, turning a logical recovery into a clean-bench mechanical job.
The second is statistical. Consumer drives carry a worst-case rating of one unrecoverable read error per 10^14 bits read, which works out to roughly 12.5TB. That is a warranty floor, not a countdown, and most drives read well past it without a single error.
But a degraded SHR-1 rebuild of, say, four 12TB drives has to read about 36TB off the surviving members to reconstruct the missing one, and across that much data the probability of hitting one latent unreadable sector is real.
On a single-parity stripe there is no second parity to rebuild that sector, so mdadm can abort the resync and drop the array to crashed. This is the math behind why a RAID 5 or SHR-1 rebuild is never a routine button press on a large modern array.
The safe path does not depend on which failure mode you are facing. Image every member with ddrescue, the PC-3000 Portable III, or a DeepSpar Disk Imager first, give any marginal drive a conservative retry profile so imaging does not accelerate wear, then recompute parity against the clones where a mistake costs nothing. The Repair button writes to the only copy you have.
Why Should I Not Move SHR Drives to a New Synology or Run mdadm --create?
Two pieces of common advice destroy more SHR arrays than the original failure did: migrating the drives to a replacement NAS, and forcing the array online with mdadm --create. Both write to the metadata you need for reconstruction.
When you move SHR drives into a new Synology and it offers to Migrate, DSM rewrites the md0 system partition that spans every drive so it can install its own operating system. If a drive went in the wrong slot, or the original crash involved partition-table corruption, that overwrite can bleed past the system partition into the data bands, taking out the LVM headers and the Btrfs trees.
When the forced import then fails, DSM commonly offers a fresh install, which finishes the job. The metadata that recovery depends on is small and irreplaceable, and these workflows are precisely what overwrites it.
The mdadm --create command is just as final. Forum threads recommend it to force a crashed array back online, but create writes a brand new superblock over the original and makes you supply the exact drive order, chunk size, and layout from memory. One wrong parameter codifies the wrong geometry, and when the filesystem tries to mount it reads scrambled, out-of-order blocks that shred the Btrfs trees and the ext4 journal.
The original superblock is gone. The only safe assembly is mdadm --assemble --readonly, run against clones. The commands in the next section are read-only forensic diagnostics for sector clones, not a repair sequence for your live, degraded drives.
How Do You Read an SHR Superblock Without Touching the Data?
The first safe forensic read is mdadm --examine /dev/sdXN, which reads the mdadm 1.2 superblock at the 4096-byte offset read-only and reports the array UUID, the member order, the chunk size, the RAID level, the array state, and the event count without writing a single byte.
It is the step that comes before any assembly attempt, because nothing can be assembled correctly until the geometry the superblock records has been read back.
The 1.2 superblock on each SHR data partition records six fields the reconstruction depends on: the array UUID, the RaidDevice member order or slot number, the chunk size, the RAID level, the array state flag, and the event count, alongside the update time.
We run mdadm --examine against the cloned members, never the originals. On a powered-down array the command is non-destructive read-only against the original drives as well, but our workflow always images first so every later step happens on a copy where a mistake costs nothing.
Three mdadm verbs sound similar and behave nothing alike, and the difference is the whole recovery:
mdadm --examine /dev/sdXN- Read-only superblock interrogation. It reports the array UUID, member order, chunk size, RAID level, array state, and event count, and it writes nothing to the device. This is the first command we run on each cloned member.
mdadm --assemble --readonly- Activates the array in read-only mode once the geometry is confirmed, covered in the recovery process below. It reuses the existing superblocks rather than writing new ones.
mdadm --create- Writes a brand new superblock over the original and is destructive. It is the command the previous section warns against, and it has no place in a recovery workflow because the original metadata it overwrites is the metadata reconstruction needs.
The practical payoff is concrete. Running mdadm --examine on every member lets you confirm the array UUID matches across all of them, read the event count to see which member fell out of sync first (the lowest event count is the member that dropped first), and recover the member order and chunk size needed to assemble the array correctly, all before a single write.
On a mixed-capacity SHR this is solved once per size band, since each band carries its own mdadm superblock with its own member order and chunk size.
Read-only forensic diagnostic (run against sector clones, never live drives)
# READ-ONLY DIAGNOSTIC. For sector-by-sector clones only,
# not live or degraded drives. --examine writes nothing.
# Read the mdadm 1.2 superblock on one cloned data partition
mdadm --examine /dev/sda3
# Reports: Array UUID, RaidDevice order, Chunk Size,
# Raid Level, Array State, and Events (event count).
# Compare the UUID and Events across every member before any assembly.There is no --create here and no in-place repair. This command reads the superblock; it never writes one.
What Is the Difference Between Storage Pool Degraded and Volume Crashed?
Storage Pool Degraded and Volume Crashed are two real status strings that Synology DSM Storage Manager shows you, and they describe two different points in the same cascade. Degraded means one member has dropped; Crashed means a second fault has landed on top of it. They are observed states, not a menu you navigate.
- Storage Pool Degraded
- One member has dropped. The array is still online and your data is still readable, but on SHR-1 the redundancy is now exhausted: there is no parity left to survive a second fault. This is the moment to act. Power down, label every drive by bay number, and image every surviving member member-by-member through a write-blocker before anything else. Do not click Repair: a rebuild forces every surviving drive to read every sector and can trigger the second fault you are trying to avoid.
- Volume Crashed
- A second fault has occurred. The array is offline and the pool is inaccessible. This is the state most people are in when they first call us. The data is usually still recoverable, but only by imaging every member and reconstructing the mdadm and LVM stack offline from the clones. Clicking Repair here re-triggers the same stall (an SMR timeout, an NVMe dirty-cache dropout on a model like the DS920+, DS1520+, or DS1621+, or a URE on a single-parity stripe) that crashed it.
The single-member drop that shows as Degraded is the first fault. The second fault, an SMR ejection, an NVMe dirty-cache dropout, or a URE on a single-parity SHR-1 stripe, is what flips the status to Volume Crashed. Reading the event count with mdadm --examine on the clones is how the lab works backward through that cascade to establish which member left first.
How Do You Recover a Synology SHR Volume?
We image every member through a hardware write-blocker, reassemble the mdadm and LVM stack from the clones, and extract Btrfs or ext4 offline. Your original drives are never modified. The steps below describe the work; they are not a do-it-yourself procedure, because a single wrong write to a superblock or an LVM header ends the recovery.
- Free evaluation: We document the model, the DSM error state, the SHR-1 or SHR-2 layout, the drive capacities and whether they are mixed, the filesystem, and which members were dropped and when. Suspected SMR members are flagged here so imaging is throttled from the start.
- Write-blocked imaging: Each member is cloned with the PC-3000 Portable III, the PC-3000 Express, or a DeepSpar Disk Imager. Marginal drives get conservative retry profiles and head maps; suspected-SMR drives get raised timeout thresholds so a cache-flush stall is not misread as a dead drive.
- Geometry and band detection: We read the mdadm 1.2 superblock at offset 4096 on each cloned data partition to recover the member order, chunk size, and RAID level for every size band. On a mixed-capacity array this is solved once per band rather than once for the volume.
- Read-only assembly: Each band is assembled from the clones with
mdadm --assemble --readonly, never with create. For arrays whose superblocks are too damaged to assemble, the bands are reconstructed virtually from the imaged members using Data Extractor Express RAID Edition on the PC-3000 Express. - LVM activation or repair: The volume group is located with
lvscanand activated read-only withvgchange -ay. If the LVM metadata bridging one band to the next is damaged, the physical-volume headers are repaired at the hex level so the bands concatenate in the right order before the filesystem is read. - Filesystem extraction: The filesystem is mounted read-only. If the Btrfs tree is damaged we work read-only with
btrfs-find-rootandbtrfs restoreagainst historical generation roots. We never runbtrfs check --repairor force a recovery mount, because copy-on-write means an in-place write destroys the older roots the extraction depends on. - Verification and delivery: Recovered data is copied to a target drive, verified against your priority file list, and shipped back. Working copies are securely purged on request.
Read-only forensic diagnostics (run against sector clones, never live drives)
# READ-ONLY DIAGNOSTIC. For sector-by-sector clones only,
# not live or degraded drives. Wrong assembly = data loss.
# Assemble one SHR size band read-only from cloned partitions
mdadm --assemble --readonly /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
# Locate and activate the LVM volume group (read-only intent)
lvscan
vgchange -ay
# Mount the filesystem read-only once the LV is active
mount -o ro /dev/vg1/volume_1 /mnt/recoverThese are diagnostics, not a repair guide. There is no --create step and no in-place filesystem repair, because both overwrite the metadata recovery depends on.
The same RAID and filesystem logic applies to every Synology layout. SHR adds the size-band concatenation; the rest mirrors any RAID data recovery case we run, and SHR is one slice of the wider NAS data recovery work we handle across every vendor.
What Does SHR Data Recovery Cost?
SHR recovery uses the same two line items as any Synology array: a per-member price based on each drive's physical and firmware condition, plus one array reconstruction fee for the mdadm, LVM, and filesystem work. There is no separate charge for mixed-capacity complexity. If we recover nothing, you owe nothing under our no-fix-no-fee guarantee.
Per-Member Drive Pricing
Each member drive is priced against the same five-tier schedule used for individual hard drive data recovery. A four-bay SHR unit with one head-swap member and three logical-only members generates an individual line item for each evaluated drive, not a single opaque bundle.
- Low complexity
Simple Copy
Your drive works, you just need the data moved off it
Functional drive; data transfer to new media
Rush available: +$100
$100
3-5 business days
- Low complexity
File System Recovery
Your drive isn't recognized by your computer, but it's not making unusual sounds
File system corruption. Accessible with professional recovery software but not by the OS
Starting price; final depends on complexity
From $250
2-4 weeks
- Medium complexity
Firmware Repair
Your drive is completely inaccessible. It may be detected but shows the wrong size or won't respond
Firmware corruption: ROM, modules, or translator tables corrupted; requires PC-3000 terminal access
CMR drive: $600. SMR drive: $900.
$600–$900
3-6 weeks
- High complexity
Most Common
Head Swap
Your drive is clicking, beeping, or won't spin. The internal read/write heads have failed
Head stack assembly failure. Transplanting heads from a matching donor drive on a clean bench
50% deposit required. CMR: $1,200-$1,500 + donor. SMR: $1,500 + donor.
50% deposit required
$1,200–$1,500
4-8 weeks
- High complexity
Surface / Platter Damage
Your drive was dropped, has visible damage, or a head crash scraped the platters
Platter scoring or contamination. Requires platter cleaning and head swap
50% deposit required. Donor parts are consumed in the repair. Most difficult recovery type.
50% deposit required
$2,000
4-8 weeks
Hardware Repair vs. Software Locks
Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.
No data, no fee. Free evaluation and firm quote before any paid work. Full guarantee details. Head swap and surface damage require a 50% deposit because donor parts are consumed in the attempt.
- Rush fee
- +$100 rush fee to move to the front of the queue
- Donor drives
- Donor drives are matching drives used for parts. Typical donor cost: $50–$150 for common drives, $200–$400 for rare or high-capacity models. We source the cheapest compatible donor available.
- Target drive
- The destination drive we copy recovered data onto. You can supply your own or we provide one at cost plus a small markup. For larger capacities (8TB, 10TB, 16TB and above), target drives cost $400+ extra. All prices are plus applicable tax.
The prices above are for standard hard drives, which covers most jobs. Helium-sealed drives (for example WD or HGST Ultrastar He and Seagate Exos X) must be resealed and refilled with helium in-house after the chamber is opened, so they price higher, in the $200–$5,000+ range. See helium drive pricing.
Array Reconstruction Fee
The array reconstruction fee is $400-$800. It covers mdadm parameter detection per size band, LVM reconstruction and any hex-level bridge repair, virtual assembly from cloned images, and Btrfs or ext4 extraction. The final figure within that range depends on member count, SHR-1 versus SHR-2, and how many distinct size bands the mixed-capacity layout contains. It is confirmed at the free evaluation alongside the per-member line items.
No Data = No Charge. If we cannot recover usable data from your SHR volume, you owe nothing under our no-fix-no-fee guarantee. There are no diagnostic fees. A rush fee of $100 moves a case to the front of the imaging queue. Optional return shipping is the only other potential cost on an unsuccessful case.
How Do I Reduce the Risk of an SHR Volume Crash?
Use CMR drives from your NAS vendor's compatibility list, not whatever SMR drive was cheapest, because the SMR timeout cascade is the failure mode that turns a healthy drive into a crashed array during a rebuild. If you run a large SHR-1, understand that a degraded rebuild on high-capacity consumer drives carries real URE risk, and consider SHR-2 on anything you cannot afford to lose so a second unreadable member does not end the array.
SHR also does not change the oldest rule in storage: a redundant array gives you hardware availability, not a backup. Ransomware, an accidental deletion, a controller fault, or a cascading failure across drives from the same manufacturing batch destroys every member at once. Keep discrete, offline backups, and verify them with a test restore to a different machine before you assume they protect you.
How Do You Recover an iSCSI LUN From a Crashed SHR Array?
An Advanced iSCSI LUN on Synology is not a physical partition you can carve straight off the member disks. It is a large sparse file named EP_DAT_00000 that lives inside a hidden @iSCSI directory on the SHR volume's Btrfs or ext4 filesystem.
You cannot reach that file until the full SHR stack this page already documents, mdadm size bands bridged by LVM with Btrfs or ext4 on top, has been reconstructed read-only first. Recovering the LUN is the same SHR reassembly described earlier, plus an extra extract-and-loop-mount stage on the end.
The extraction order ties directly back to the multi-band SHR reconstruction covered in the recovery-process section above:
- Reconstruct the SHR stack read-only: Image every member, reconstruct each SHR mdadm size band read-only, and bridge the bands with LVM, exactly as for a non-iSCSI SHR crash. The LUN file cannot be touched until the host volume underneath it is assembled.
- Mount the host filesystem read-only: Mount the Btrfs or ext4 host filesystem read-only and navigate to the hidden
@iSCSIdirectory that holds the LUN container. - Extract the container file: Copy the
EP_DAT_00000container file off the host filesystem to a healthy target drive. - Loop-mount the extracted file: Attach the extracted file as a block device with
losetup -Pso the kernel scans the partition table the iSCSI initiator wrote inside the container, exposing the inner NTFS, VMFS, or ext4 filesystem so the data can be read.
Whether the LUN is thin-provisioned or thick-provisioned changes how the container has to be handled during extraction:
- Thin-provisioned LUN
- The
EP_DATfile is sparse: only the allocated extents hold real data, and the rest is unwritten holes. A naive flat copy that fills those holes can inflate the file to its full theoretical size and exhaust the target drive. The extract step has to preserve sparseness so a 2TB thin LUN holding 300GB of real data does not balloon into a 2TB flat file. - Thick-provisioned LUN
- The container is fully allocated up front, so the file already occupies its declared size on the host filesystem. There are no holes to preserve, and the extracted file matches the size the initiator saw.
When the Btrfs extents holding the EP_DAT file are themselves damaged on a degraded SHR, the forensic path is the same read-only one this page uses for any damaged Btrfs tree. We work read-only with btrfs-find-root and btrfs restore against historical generation roots, and never run btrfs check --repair, because copy-on-write means an in-place write overwrites the older, still-valid tree roots that the extraction depends on.
If the container's own extents are marginal, we ddrescue-image the extracted EP_DAT file to a healthy target before loop-mounting it, because loop-mounting a damaged container off a failing host filesystem risks I/O hangs in the loop driver.
Two forum claims send people down dead ends. The first is that a LUN is a physical partition you can carve directly off the member disks: it is not, because it only exists as a file inside a host filesystem that has to be reconstructed first.
The second is that the LUN can be recovered without reconstructing the SHR mdadm, LVM, and Btrfs or ext4 stack underneath it. There is no shortcut past the host volume; the container has no meaning until the filesystem that stores it is mounted.
Read-only forensic diagnostic (run against sector clones, never live drives)
# READ-ONLY DIAGNOSTIC. For sector-by-sector clones only,
# not live or degraded drives. The host SHR stack is already
# assembled read-only before any of this runs.
# Mount the reconstructed SHR host filesystem read-only
mount -o ro /dev/vg1/volume_1 /mnt/recover
# Locate the LUN container in the hidden @iSCSI directory
ls /mnt/recover/@iSCSI/
# Extract the EP_DAT container (preserve sparseness on a thin LUN,
# or ddrescue it first if the container extents are marginal)
cp /mnt/recover/@iSCSI/EP_DAT_00000 /target/lun.img
# Loop-mount the extracted container read-only and read the inner FS
losetup -fP /target/lun.img
mount -o ro /dev/loop0p1 /mnt/lunThese are diagnostics, not a repair guide. No btrfs check --repair, no in-place writes.
What Happens When an Encrypted SHR Volume Crashes?
An encrypted SHR crash is a two-track problem, and the honest answer is that one track is geometry and the other is a key you either have or you do not. Track one rebuilds the same mdadm and LVM block stack this page already documents. Track two unlocks the LUKS container that DSM 7.2 layers on top of it.
If the wrapped key on the md0 system partition is gone and you never exported the recovery key, a perfectly reconstructed block stack still decrypts to nothing, because a 512-bit AES-XTS master key cannot be brute-forced.
DSM 7.2 full volume encryption is LUKS running in aes-xts-plain64 mode, the Linux device-mapper dm-crypt target. On an encrypted SHR volume that target sits between the LVM logical volume and the filesystem, which adds a fourth layer to the mdadm, LVM, and Btrfs or ext4 stack documented above. Counting from the disks up, the order is now mdadm size bands, the LVM logical volume, the LUKS container, then the filesystem that lives inside the container.
Three of the four layers are pure geometry: they reconstruct from on-disk metadata with no secret involved. The LUKS layer is the exception. It is the only layer whose recovery depends on a key rather than on geometry, and that single difference is what sets an encrypted SHR case apart from an unencrypted one.
- 1. mdadm software RAID (geometry)
- The data partitions aggregate into one or more mdadm size bands, exactly as on an unencrypted SHR. Each band carries its 1.2 superblock and assembles read-only with
mdadm --assemble --readonly. No key is involved at this layer. - 2. LVM logical volume (geometry)
- LVM concatenates the bands into one logical volume and activates with
vgchange -ay. On an encrypted volume the logical volume holds ciphertext rather than a mountable filesystem, but the layer itself is still recovered from LVM metadata alone, with no key. - 3. LUKS aes-xts-plain64 container (key-dependent, the new layer)
- This is the device-mapper dm-crypt target that DSM 7.2 places on top of the logical volume. It is the only layer that no amount of geometry work can open. Unlocking it needs the recovery key passphrase, supplied to
cryptsetup luksOpen, which creates the plaintext device mapping the filesystem lives inside. Without the key, the layer below it is intact and the layer above it is unreachable. - 4. Btrfs or ext4 filesystem
- The filesystem sits inside the unlocked LUKS container. Once the container opens, the plaintext mapping behaves like any SHR volume and the Btrfs or ext4 filesystem extracts read-only with the same tooling the rest of this page describes, including
btrfs-find-rootandbtrfs restoreagainst historical generation roots where the tree is damaged.
The key material that auto-mounts the volume is held by the DSM Encryption Key Vault, local or remote over KMIP. The wrapped auto-mount key is bound to md0, the DSM system partition that mirrors across every member drive, which is why the Key Vault survives a single-drive failure but not the loss of the system RAID itself.
Recovery proceeds on the two tracks in order: first the block-stack reconstruction described throughout this page, then, once the LVM logical volume is active, the decryption pass.
The decryption pass runs cryptsetup luksOpen --readonly against the activated logical volume using the recovery key passphrase. That command maps a plaintext device without ever writing back to the encrypted container, and the Btrfs or ext4 filesystem is then mounted read-only off that mapping. If the Key Vault on md0 is intact, or you exported the recovery key when you enabled encryption, this step is routine and the case becomes a standard SHR extraction from that point on.
Here is the hard limit, stated plainly. AES-XTS with a 512-bit master key cannot be brute-forced. If md0 and the Key Vault are unrecoverable and no recovery key was ever exported, decryption is mathematically impossible even with a flawless mdadm, LVM, and Btrfs block stack underneath. No forensic tool and no lab technique circumvents that.
Any lab claiming a proprietary decryptor that opens a LUKS volume without the key is describing something that does not exist, and the same is true of the common forum advice to move the encrypted drives into a donor Synology: the wrapped key is bound to the original unit's Key Vault, so a replacement chassis does not unlock the volume without the recovery passphrase.
Encrypted-volume cases outside SHR, including legacy file-level shared-folder encryption, are covered in more depth on our Synology encrypted volume recovery page.
Read-only forensic diagnostic (run against sector clones, never live drives)
# READ-ONLY DIAGNOSTIC. For sector-by-sector clones only,
# not live or degraded drives. The block stack reconstructs
# from geometry; the LUKS layer needs the recovery key.
# Track 1: assemble the mdadm band and activate LVM read-only
mdadm --assemble --readonly /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
vgchange -ay
# Track 2: unlock the LUKS container on the logical volume.
# Requires the recovery key passphrase. There is no way past
# this step without it; a 512-bit AES-XTS key is not brute-forceable.
cryptsetup luksOpen --readonly /dev/vg1/volume_1 shr_crypt
# Mount the filesystem inside the unlocked container read-only
mount -o ro /dev/mapper/shr_crypt /mnt/recoverThere is no btrfs check --repair here and no in-place repair. The container is opened read-only and the filesystem is read, never rewritten, and no command on this list can recover the data if the recovery key is gone.
What Are the Most Common SHR Recovery Questions?
Is Synology SHR proprietary hardware?
Can I use mdadm --create to recover a crashed Synology volume?
Can SHR be recovered without a Synology NAS?
Why did my SHR volume crash during a rebuild?
What is the difference between SHR-1 and SHR-2?
Should I click Repair in Synology Storage Manager?
My mixed-capacity SHR array will not assemble. Is it harder to recover?
What does Synology SHR recovery cost?
Can data be recovered from a crashed Synology encrypted volume?
Data Recovery Standards & Verification
Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to maintain drive integrity. This approach allows us to serve clients nationwide with consistent technical standards.
Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.
Transparent History
Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.
Media Coverage
Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.
Aligned Incentives
Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.
Technical Oversight
Louis Rossmann
Our engineers review all lab protocols to maintain technical accuracy and honest service. Since 2008, his focus has been on clear technical communication and accurate diagnostics rather than sales-driven explanations.
We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.
See our clean bench validation data and particle test videoRelated services
Related Recovery Services
Volume Crashed, Storage Pool Degraded, SHR reconstruction, and Btrfs/EXT4 recovery for all DiskStation and RackStation models.
DSM 7.2 LUKS aes-xts-plain64 volumes and legacy eCryptfs shared folders, with the honest limits when the key is gone.
Recovery for all NAS brands including QNAP, Buffalo, Western Digital, and Asustor.
Hardware and software RAID array reconstruction for RAID 0, 1, 5, 6, and 10.
Synology showing a red Volume Crashed banner?
Free evaluation. No data = no charge. Ship your drives from anywhere in the U.S.