Skip to main contentSkip to navigation
Lab Operational Since: 17 Years, 6 Months, 14 DaysFacility Status: Fully Operational & Accepting New Cases
RAID Recovery

How to Safely Troubleshoot a Degraded RAID Array

Your RAID controller is reporting a degraded array. One drive has failed, and the array is still serving data, but it has lost its fault tolerance. The next action you take determines whether this becomes a routine drive replacement or a catastrophic data loss event.

This guide covers what degraded means across RAID levels, how to check controller logs without triggering a rebuild, and why the default auto-rebuild behavior is dangerous on modern large-capacity drives.

Author01/07
Louis Rossmann
Written by
Louis Rossmann
Founder & Chief Technician
Updated February 2026
What Degraded Means02/07

What a Degraded RAID Array Actually Means

A degraded array is still operational but has lost its redundancy. It is running on borrowed time. The array can serve reads and writes, but a second drive failure will exceed the fault tolerance of most RAID levels.

  1. The RAID controller detects that a member drive has stopped responding, is returning errors, or has been physically removed.
  2. The controller marks the drive as failed and continues operating using the remaining drives and parity data (RAID 5/6) or the surviving mirror (RAID 1/10).
  3. Read performance drops because the controller must compute the missing data for every stripe that included the failed drive.
  4. Write performance may also decrease because parity updates now require reading additional blocks for the XOR calculation.
  5. The array remains in this state until a replacement drive is inserted and the rebuild completes, or until a second drive fails.

Example: A Dell PowerEdge R640 with a PERC H740 controller running 8-drive RAID 5. Drive 3 (bay 3) starts reporting SMART predictive failure. The controller marks it as failed and transitions the virtual disk to "Degraded." The server continues serving the file share. Users notice slower access times on large files because every read from a stripe that included drive 3 now requires the controller to XOR the remaining 7 drives to compute the missing data.

Degraded States Across RAID Levels03/07

How Does Degraded State Differ Across RAID Levels?

The consequences of a degraded state depend on the RAID level. RAID 5 has zero remaining margin. RAID 6 can survive one more failure. RAID 10 depends on which mirror pair lost a drive.

RAID LevelDegraded Behavior
RAID 1 MirrorOne mirror drive failed. Data is intact on the surviving drive, completely unprotected. A failure of the remaining drive means total data loss. Recovery from degraded RAID 1 is the simplest case: the surviving drive contains a complete copy of all data.
RAID 5 recoverySingle ParityOne drive failed. Parity reconstructs the missing data on-the-fly. A second drive failure of any kind (complete failure or a single URE) during a rebuild is fatal. No remaining margin. This is the highest-risk degraded state for arrays with large drives.
RAID 6 recoveryDual ParityOne or two drives failed. Dual parity provides one more drive of margin compared to RAID 5. A single-degraded RAID 6 can survive one more failure. A double-degraded RAID 6 is in the same position as a degraded RAID 5: zero remaining margin. Rebuild times on large arrays (10TB+ drives) can exceed 48 hours.
RAID 10 Mirrored StripesOne drive in a mirror pair failed. The surviving mirror serves data. The array can survive additional failures as long as they occur in different mirror pairs. If the other drive in the same mirror pair fails, that stripe is lost. Rebuild of RAID 10 is faster because only the failed drive's mirror needs to be copied (not the entire array).

Example: A 6-drive RAID 10 (3 mirror pairs). Drive 2 (pair B, member 1) fails. The array is degraded but can survive failures in pair A or pair C without data loss. If drive 3 (pair B, member 2) fails, pair B has no surviving copy and the entire array loses access to that stripe. RAID 10 degraded risk is localized to the affected mirror pair.

Checking Controller Logs04/07

Checking Controller Logs Without Triggering a Rebuild

Before inserting a replacement drive, check the RAID controller logs and SMART data on all surviving drives. The goal is to identify whether any other drives are showing early failure signs before committing to a rebuild.

Hardware RAID

  • Dell: OMSA (OpenManage Server Administrator) or racadm/iDRAC web interface
  • HP/HPE: iLO web interface or Smart Storage Administrator (SSA)
  • LSI/Broadcom: MegaCLI or StorCLI command-line utilities
  • Adaptec: arcconf command-line utility or maxView web interface

Linux mdadm Software RAID

  • cat /proc/mdstat shows array state and rebuild progress
  • mdadm --detail /dev/mdX shows detailed array status including member drives
  • smartctl -a /dev/sdX shows SMART attributes per drive
  • Check for Reallocated_Sector_Ct, Current_Pending_Sector, and Offline_Uncorrectable counters above zero

Example: An HP ProLiant DL380 Gen10 with SmartArray P816i-a running 8-drive RAID 5. Drive 3 shows "Failed." Before inserting a replacement, the admin opens SSA and checks SMART data for all remaining drives. Drive 7 shows 14 Reallocated Sectors and 3 Current Pending Sectors. This drive is likely to fail during the rebuild. The admin now knows that a standard rebuild carries high risk and can make an informed decision about whether to image the drives first.

Why Auto-Rebuild Is Dangerous05/07

Why Auto-Rebuild Is Dangerous on Large Drives

Most RAID controllers are configured to begin rebuilding automatically when a hot spare is present or a new drive is inserted. On arrays with large drives (4TB and above), this default behavior is the most common cause of rebuild failures.

  1. Auto-rebuild starts immediately, giving the administrator no opportunity to check SMART data on surviving drives.
  2. For parity arrays (RAID 5/6), the rebuild reads every sector of every surviving drive under sustained sequential I/O. RAID 1/10 rebuilds read only the mirror partner. On aging consumer 8TB drives in a parity array, a 24TB rebuild places sustained mechanical stress on the remaining members, increasing the risk of a secondary failure or latent sector error.
  3. Drives from the same manufacturing batch tend to fail in close succession. If one drive from a batch of 8 has failed, the remaining 7 are statistically more likely to fail under the increased load of a rebuild.
  4. Rebuild times on large arrays can exceed 24 hours, during which the array runs with zero fault tolerance (RAID 5) and the drives experience sustained sequential I/O.

Before inserting a replacement drive: disable auto-rebuild in the controller BIOS, remove any configured hot spares, and verify SMART health on every surviving drive. For RAID data recovery service scenarios where the data is irreplaceable, image all surviving drives before the rebuild starts.

Example: A file server with 6 Seagate Exos 16TB drives in RAID 5. Drive 4 fails. A hot spare activates and the rebuild begins automatically. The rebuild must read 5 x 16TB = 80TB from the surviving drives under sustained sequential I/O. This prolonged, intensive operation on aging drives with similar wear profiles creates a high risk of a secondary mechanical failure or latent sector error before the rebuild can complete.

Calculating Fault Tolerance06/07

Calculating Your Remaining Fault Tolerance

Before deciding how to respond to a degraded array, calculate how many additional failures your array can tolerate. This determines the urgency and risk of each possible action.

  1. RAID 1: tolerates (N/2 - 1) additional failures if N drives remain. A 2-drive RAID 1 with one failed drive has zero margin.
  2. RAID 5: tolerates exactly 0 additional failures once degraded. Any error on any surviving drive is fatal.
  3. RAID 6: tolerates 1 additional failure once single-degraded, 0 once double-degraded.
  4. RAID 10: tolerates additional failures only in mirror pairs that still have both members. Losing both drives in any single pair is fatal for that stripe.
  5. Factor in the rebuild duration. A 24-hour rebuild window on drives from the same batch and age is 24 hours of elevated failure risk.

Example: A 10-drive RAID 6 with one failed drive. Remaining fault tolerance: 1 more drive. The admin checks SMART data and finds all 9 surviving drives are from the same Seagate batch purchased 4 years ago. Two drives show elevated reallocated sector counts. The rebuild will read 9 x 12TB = 108TB and take an estimated 36 hours. The admin decides to image all 9 drives before initiating the rebuild, preserving the current degraded state as a fallback.

Recovery Pricing07/08

If You Need a Recovery Lab: Pricing

When a degraded array cannot be rebuilt safely, the cost depends on the failure mode of each member drive, not on RAID level alone.

Per-drive recovery pricing on RAID member disks falls in the $100–$2,000 range. Logical-only array reconstruction on physically healthy members sits at the lower end. Recoveries that involve head swaps, donor parts, or platter work on multiple members move toward the upper end. For larger capacities (8TB, 10TB, 16TB and above), target drives cost $400+ extra. The full per-tier breakdown lives on the hard drive recovery cost page, with RAID-specific guidance on the RAID data recovery service page.

  • No diagnostic fees. We image and assess before quoting.
  • +$100 rush fee to move to the front of the queue.
  • No-data, no-recovery-fee guarantee. If the data cannot be returned, the recovery attempt is free.
  • All work performed at the Austin, TX lab. Mail-in for nationwide RAID member drives.
Faq08/08

Frequently Asked Questions

What does degraded RAID mean?
A degraded RAID array has lost one or more member drives but remains operational by computing the missing data on each read using parity (RAID 5/6) or serving from a surviving mirror (RAID 1/10). The array is functioning but has lost its fault tolerance. A second failure during degraded operation can be catastrophic for RAID 5, survivable with one more drive of margin for RAID 6, and depends on which mirror pair is affected for RAID 10.
Can a degraded RAID array still lose data?
Yes. A degraded array has no remaining redundancy (RAID 5) or reduced redundancy (RAID 6). Any additional drive failure, URE during a rebuild, or controller error can cause permanent data loss. The array is running without a safety net. The longer it operates in degraded mode, the higher the probability of a second failure due to increased I/O load on the surviving drives.
Should I replace a failed drive in a degraded RAID immediately?
Not without first understanding the risk. Inserting a replacement drive typically triggers an automatic rebuild. For parity-based arrays (RAID 5, RAID 6), this rebuild reads every sector of every surviving drive to recalculate the missing data. RAID 1 and RAID 10 rebuilds read only the surviving mirror partner. On large consumer drives (4TB+) in parity arrays, the probability of encountering an Unrecoverable Read Error during this full-disk read is high enough that the rebuild itself can cause the array to fail. For arrays containing irreplaceable data, imaging the surviving drives before initiating a rebuild is the safer approach.
Can I run a degraded RAID 5 in place?
Technically yes, the array will continue serving reads and writes while degraded, but you are running with zero fault tolerance. Every read from a stripe that included the failed drive forces the controller to XOR the remaining members to reconstruct the missing block, which increases I/O load on the surviving drives. A single URE on any one of those drives during normal operation will return bad data with no parity left to correct it. If the data is replaceable and the array is small, you can run degraded long enough to plan a maintenance window. If the data is irreplaceable, take the array offline and image the surviving members before doing anything else.
Should I let the controller auto-rebuild?
Not on a parity array containing data you cannot afford to lose. Auto-rebuild on RAID 5 or RAID 6 starts a full-array read across every surviving drive the moment a hot spare activates or a replacement is inserted. You lose the opportunity to check SMART data on the surviving members, and any drive showing reallocated sectors or pending sectors is statistically likely to fail under the sustained sequential I/O of a rebuild. Disable auto-rebuild in the controller BIOS, remove configured hot spares, verify SMART health on every surviving drive, and only then make the rebuild decision. For irreplaceable data, image first.
How much does RAID data recovery cost?
Per-drive recovery pricing on the member disks falls in the $100–$2,000 range depending on what is wrong with each drive. Logical-only RAID reconstruction on healthy members sits at the lower end of that range; recoveries involving head swaps, surface damage, or firmware work on multiple members move toward the upper end. Final cost depends on how many drives need physical work and which failure tier each one falls into. There are no diagnostic fees, and the no-data, no-recovery-fee guarantee applies. See the full breakdown at /hard-drive-data-recovery-cost.

Data Recovery Standards & Verification

Our Austin lab operates on a transparency-first model. We use industry-standard recovery tools, including PC-3000 and DeepSpar, combined with strict environmental controls to make sure your hard drive is handled safely and properly. This approach allows us to serve clients nationwide with consistent technical standards.

Open-drive work is performed in a ULPA-filtered laminar-flow bench, validated to 0.02 µm particle count, verified using TSI P-Trak instrumentation.

Transparent History

Serving clients nationwide via mail-in service since 2008. Our lead engineer holds PC-3000 and HEX Akademia certifications for hard drive firmware repair and mechanical recovery.

Media Coverage

Our repair work has been covered by The Wall Street Journal and Business Insider, with CBC News reporting on our pricing transparency. Louis Rossmann has testified in Right to Repair hearings in multiple states and founded the Repair Preservation Group.

Aligned Incentives

Our "No Data, No Charge" policy means we assume the risk of the recovery attempt, not the client.

We believe in proving standards rather than just stating them. We use TSI P-Trak instrumentation to verify that clean-air benchmarks are met before any drive is opened.

See our clean bench validation data and particle test video

Array degraded and data is irreplaceable?

Free evaluation. Write-blocked drive imaging. Offline array reconstruction. No data, no fee.

(512) 212-9111Mon-Fri 10am-6pm CT
No diagnostic fee
No data, no fee
4.9 stars, 1,837+ reviews

4.9★ · 1,837+ reviews