NAND Degradation Data Recovery: Worn SSD Flash Recovery

Q: Can data be recovered from a worn-out SSD?

Yes, provided the NAND cells still retain enough charge for the PC-3000 SSD to resolve voltage states using read retry and threshold voltage shifting. Standard recovery tools rely on the controller's default read settings, which fail once bit errors exceed the built-in ECC capacity. PC-3000 bypasses the controller's read pipeline and adjusts voltage reference levels directly, recovering data from cells that the controller has already given up on. Recovery pricing for SATA SSDs ranges from $200–$1,500; NVMe SSDs range from $200–$2,500.

Q: What causes NAND flash to degrade?

Every program/erase cycle damages the tunnel oxide layer that traps electrons in NAND cells. SLC NAND tolerates roughly 100,000 P/E cycles before degradation becomes measurable. MLC drops to 3,000-10,000 cycles, TLC to 1,000-3,000, and QLC to 100-1,000. Write amplification from the SSD's internal garbage collection, wear leveling, and TRIM operations means the NAND sees more writes than the host system sends. A drive rated for 300 TBW (terabytes written) at the host level may consume its P/E budget well before reaching that figure if write amplification is high.

Q: How do I know if my SSD's NAND is degraded?

Check SMART attributes using CrystalDiskInfo or smartmontools. Key indicators: Media Wearout Indicator (SMART 233) near zero, Percentage Lifetime Used (SMART 202) above 95%, depleted Available Reserved Space (SMART 170), and high Reallocated Sector Count (SMART 5). The drive may also become intermittently slow, drop to read-only mode, or fail to boot the operating system while still being detected in BIOS.

Q: Why does consumer recovery software fail on degraded SSDs?

Consumer software (Disk Drill, EaseUS, R-Studio) sends standard read commands through the operating system. If the SSD controller cannot resolve a page because bit errors exceed its ECC capacity, it returns an I/O error to the OS. The software has no mechanism to adjust the controller's internal read retry count or voltage reference levels. PC-3000 SSD communicates with the controller through vendor-specific diagnostic commands, bypassing the standard read pipeline entirely.

Q: What is read disturb and how does it cause data loss?

Read disturb is an unintended side effect of reading NAND flash. Every read operation applies a voltage to the selected word line. Adjacent, unselected cells absorb a fraction of that voltage. Over millions of reads, this accumulates enough charge to shift the threshold voltage of neighboring cells, flipping bits. The effect is cumulative and irreversible without erasing and reprogramming the block. Drives that serve heavy read workloads (database servers, surveillance systems) are most vulnerable.

Q: How long can a powered-off SSD retain data?

JEDEC standard JESD218A specifies that a consumer SSD at end-of-life retains data for 52 weeks at 30 degrees Celsius storage temperature. At 40 degrees, retention drops to roughly 13 weeks under the Arrhenius model (activation energy 1.1 eV for planar NAND). Enterprise SSDs are rated for 3 months at 40 degrees. TLC and QLC cells retain charge for shorter periods than SLC or MLC because they store more bits per cell with narrower voltage margins. A drive stored in a hot environment (attic, parked car) loses data faster than one stored at room temperature.

Q: How much does NAND degradation recovery cost?

SATA SSD recovery ranges from $200–$1,500. NVMe SSD recovery ranges from $200–$2,500. Degraded NAND typically falls into the firmware recovery tier ($600–$900 for SATA, $900–$1,200 for NVMe) because it requires PC-3000 low-level reads with custom read retry parameters. If the controller is also damaged, board-level repair adds circuit board costs. Free evaluation, firm quote before work begins. No data, no fee. +$100 rush fee to move to the front of the queue.

Q: Does TRIM accelerate NAND degradation?

TRIM itself does not degrade NAND. TRIM tells the controller which logical blocks are no longer in use, allowing the controller to erase those physical blocks during garbage collection. Erasing a block consumes one P/E cycle. The indirect effect: aggressive TRIM combined with frequent file creation and deletion increases the rate of block erases. The direct accelerator of NAND degradation is write amplification from garbage collection, not the TRIM command itself.

Q: Can a degraded SSD that has dropped into TRIM/RZAT lockup still be recovered?

When a worn drive enters Deterministic Read Zero after TRIM (RZAT) lockup, the controller returns zeros for any LBA mapped to a trimmed-but-not-yet-erased block. The user data may still be physically present on the NAND array if garbage collection hasn't completed the block erase, but the controller refuses to expose it through the standard read pipeline. PC-3000 SSD bypasses the FTL by entering vendor diagnostic mode on the controller and reading raw NAND pages directly, then reconstructing the logical-to-physical map from surviving metadata. Recoverability depends on how many trimmed blocks have already been physically erased by background garbage collection. Power the drive off the moment you suspect data loss; every minute of idle power draw consumes more recoverable blocks. SATA SSD recovery ranges from $200–$1,500; NVMe SSD recovery ranges from $200–$2,500. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.

Q: Why did my SSD suddenly go read-only?

A sudden switch to read-only on a worn SSD means the controller has run out of room to manage wear. When the reserved spare-block pool hits the firmware floor and program/erase operations keep failing, the firmware halts writes and freezes the translator to protect data that still resolves. This is a hardware-level firmware defense, not a Windows or macOS permissions error, so registry edits and "remove write protection" steps cannot clear it. Power the drive off and avoid formatting or force-mounting it; each write attempt can push surviving service-area metadata past recovery. Worn read-only drives usually fall in the firmware tier ($600–$900 for SATA, $900–$1,200 for NVMe) because PC-3000 SSD must enter vendor mode and rebuild the translator from surviving metadata.

When A Worn-NAND Drive Reaches The Lab01B/18

Is a Worn-Out SSD Still Recoverable?

Yes, a worn-out SSD can usually be recovered. When the NAND cells still hold enough charge for PC-3000 SSD to resolve their voltage states with read-retry & voltage-threshold shifting, the lab reads pages the controller's default ECC pipeline already abandoned, then images what it recovers to new media.

Here is the short version for someone whose drive just died. The full PC-3000 SSD read-retry sequence runs in detail further down this page.

Free evaluation & firm quote. We read the SMART wear data, identify the controller, & quote the job before any recovery work starts.
Controller ID & halting background operations. PC-3000 SSD enters the controller's vendor diagnostic mode & stops garbage collection so no further blocks erase while we work.
Read-retry with voltage-threshold adjustment. The technician shifts the NAND read reference voltages below the controller's defaults to resolve cells the firmware gave up on.
Multi-pass imaging to new media. The drive images across several passes, each one filling gaps the last pass missed.
File listing before delivery. You confirm what came back before the recovered data ships.

Degraded NAND stays recoverable while the raw bit error rate (RBER) holds under the controller's LDPC soft-decision ceiling; under that line, voltage-threshold shifting still resolves the cells. True exhaustion is uncommon: it sets in only once RBER permanently passes that ceiling & no voltage shift recovers the page. The recoverable-vs-terminal decision matrix further down this page maps where a given drive sits. Drives powered off & shipped promptly almost never cross that line; the months-powered & long-stored drives are the ones that do.

No data, no recovery fee. All work happens in-house at our Austin, TX lab, & SSD recovery is board-level electronics work, not mechanical. SATA SSD recovery runs $200–$1,500; NVMe SSD recovery runs $200–$2,500.

What Is NAND Degradation? (Consumer-Friendly)02/18

What Is NAND Degradation?

NAND degradation is the gradual physical wearing of the memory cells inside your SSD. Every time data is written and erased, the insulating layer inside each cell gets thinner. Once that layer is too damaged, the cell can no longer reliably store data.

The SSD's built-in error correction compensates for a while, but once too many cells degrade past the correction threshold, the drive starts losing data or stops working.

This is a normal end-of-life process for all SSDs. The drive's rated lifespan (expressed as TBW, or terabytes written) is the manufacturer's estimate of how much data can be written before degradation causes failures. Heavy workloads, high operating temperatures, and frequent small writes accelerate the process.

Consumer recovery software cannot help once NAND degradation reaches the point where the controller rejects reads. The controller returns I/O errors to the operating system, and no software running on top of that OS can override the controller's decision. Professional recovery tools like PC-3000 SSD communicate with the controller through vendor-specific diagnostic channels, adjusting how the controller reads the degraded cells.

Warning Signs03/18

How Do You Know Your SSD's NAND Is Failing?

NAND degradation produces specific, observable symptoms before total failure. A failing SSD may switch to read-only mode, pause during read operations, corrupt files without warning, appear in BIOS while the operating system cannot mount the file system, or report Caution or Bad health in CrystalDiskInfo or smartmontools.

●Read-only mode. The SSD switches itself to read-only to prevent further writes from destroying remaining data. Files are visible but you cannot save, delete, or modify anything. This is a firmware-level protection triggered when the reserved block pool is depleted, and it is the most common failure mode arriving for SSD data recovery on heavily-written consumer drives.
●Intermittent slowdowns. The drive pauses for seconds or minutes during read operations as the controller retries failed NAND pages. SMART attribute 1 (Raw Read Error Rate) or attribute 187 (Uncorrectable Error Count) spikes.
●File corruption without warning. Files open but contain garbled data, truncated images, or zero-filled blocks. The controller returned data from cells where the voltage state was misread due to degradation.
●BIOS detection, OS failure. The drive appears in BIOS with correct model and capacity, but the operating system cannot mount the file system. The firmware is functional, but too many NAND pages return uncorrectable errors for the file system to parse.
●SMART warnings. CrystalDiskInfo or smartmontools reports "Caution" or "Bad" health. Media Wearout Indicator (SMART 233) near zero, Percentage Lifetime Used above 95%, or Available Reserved Space depleted.

If the drive shows any of these symptoms, power it off. Continued read attempts accelerate garbage collection and can trigger block erases that destroy recoverable data.

Recovery Process (Consumer-Level)04/18

How We Recover Data from Degraded NAND

NAND degradation recovery is a firmware-level process. The PC-3000 SSD module enters the controller's diagnostic mode and reads NAND pages with adjusted parameters that the controller would never use on its own. SSD recovery is board-level electronics work, not mechanical.

Free evaluation. We assess the drive's SMART data, controller model, and failure mode. You receive a firm price quote before any recovery work begins.
Controller identification and diagnostic entry. PC-3000 SSD identifies the controller family (Phison, Silicon Motion, legacy Samsung, Marvell, supported Maxio) and enters vendor-specific diagnostic mode to halt background operations.
Baseline error assessment. The technician runs a full-surface read pass to map which NAND blocks are readable, marginal, and unreadable at default settings.
Read retry calibration. For marginal and unreadable blocks, PC-3000 adjusts the read retry count and voltage reference thresholds. Each retry uses a slightly different voltage level to resolve ambiguous cell states.
Multi-pass imaging. The drive is imaged across multiple passes, each with different read parameters. Sectors recovered in later passes fill gaps from earlier attempts.
File system reconstruction. The composite image is assembled and the file system is parsed. You receive a file listing before final delivery.

If the controller itself is also damaged (dead, not detected in BIOS), board-level microsoldering repair is required before NAND reads can begin. On unencrypted drives where the controller cannot be repaired, chip-off extraction or, on drives where the NAND and controller share a single package, monolithic NAND recovery may be attempted as a last resort. Drives with hardware AES encryption require a functional controller to decrypt, so board repair is mandatory.

Pricing05/18

SSD Recovery Pricing

NAND degradation recovery is covered by our standard SSD recovery pricing tiers. Most degraded-NAND cases fall into the firmware recovery tier because they require PC-3000 low-level access with custom read parameters. SATA SSD recovery ranges from $200–$1,500; NVMe SSD recovery ranges from $200–$2,500.

Free evaluation, firm quote, no data = no charge. +$100 rush fee to move to the front of the queue. Tiers requiring donor drives include additional donor cost (A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.).

Low complexity
Simple Copy
Your drive works, you just need the data moved off it
Functional drive; data transfer to new media
Rush available: +$100
$200
3-5 business days
Low complexity
File System Recovery
Your drive isn't showing up, but it's not physically damaged
File system corruption. Visible to recovery software but not to OS
Starting price; final depends on complexity
From $250
2-4 weeks
Medium complexity
Circuit Board Repair
Your drive won't power on or has shorted components
PCB issues: failed voltage regulators, dead PMICs, shorted capacitors
May require a donor drive (additional cost)
$450–$600
3-6 weeks
Medium complexity
Most Common
Firmware Recovery
Your drive is detected but shows the wrong name, wrong size, or no data
Firmware corruption: ROM, modules, or system files corrupted
Price depends on extent of bad areas in NAND
$600–$900
3-6 weeks
High complexity
PCB / NAND Swap
Your drive's circuit board is severely damaged and requires NAND chip transplant to a donor PCB
NAND swap onto donor PCB. Precision microsoldering and BGA rework required
50% deposit required; donor drive cost additional
50% deposit required
$1,200–$1,500
4-8 weeks

Hardware Repair vs. Software Locks

Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.

No data, no fee. Free evaluation and firm quote before any paid work. Full guarantee details. NAND swap requires a 50% deposit because donor parts are consumed in the attempt.

Rush fee: +$100 rush fee to move to the front of the queue
Donor drives: A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.
Target drive: The destination drive we copy recovered data onto. You can supply your own or we provide one at cost plus a small markup. All prices are plus applicable tax.

SSD Recovery Calculator06/18

Estimate Your SSD Recovery Cost

Select your symptoms and drive type for a preliminary cost range. Final pricing comes after a free evaluation at our Austin, TX lab.

Drive Type

Symptoms

Estimate

What type of SSD do you have?

This determines the recovery method and pricing.

Not sure which type you have? Call (512) 212-9111 and we can help identify it.

P/E Cycle Exhaustion07/18

P/E Cycle Exhaustion: Tunnel Oxide Degradation in NAND Cells

Every program/erase cycle forces electrons through the tunnel oxide layer via Fowler-Nordheim tunneling. Each pass leaves behind trapped charge in the oxide and physically weakens the dielectric. After enough cycles, the oxide can no longer hold a consistent charge, and the threshold voltage distributions for each cell state widen until they overlap.

The endurance ceiling depends on NAND density. SLC NAND, storing 1 bit per cell with only two voltage states, tolerates the widest voltage margins and survives the most cycles. Each additional bit per cell halves the available voltage window between states.

NAND Type	Bits per Cell	Voltage States	Typical P/E Endurance
SLC	1	2	100,000 cycles
MLC	2	4	3,000 to 10,000 cycles
TLC	3	8	1,000 to 3,000 cycles
QLC	4	16	100 to 1,000 cycles

Consumer SSDs sold today use TLC or QLC NAND. A 1TB TLC drive rated at 600 TBW allows roughly 600 full-drive writes before the manufacturer expects degradation failures. Write amplification from garbage collection, wear leveling, and journal writes means the NAND sees 2x to 10x more physical writes than the host reports, depending on workload pattern and controller efficiency.

How P/E Exhaustion Appears in PC-3000 Diagnostics

P/E exhaustion produces a distinctive diagnostic fingerprint that separates it from other NAND failure modes. Because wear leveling distributes writes across all blocks, the damage is global. Every block in the array shows similar degradation levels, and PC-3000's surface scan reveals a uniform pattern of rising errors across the entire LBA space rather than localized clusters.

ECC Error Rate Escalation: If a drive has consumed its P/E budget, PC-3000 diagnostics show bit error rates climbing uniformly across every NAND block. The RBER (raw bit error rate) approaches or exceeds the LDPC correction threshold on most pages, not just isolated zones. On a worn TLC drive, the correctable error count per 16KB page may exceed 40 bits where fresh NAND shows under 5. This uniform distribution is the signature of wear leveling doing its job; the cells degraded evenly.
Read Retry Count Escalation: The controller's internal read retry counter jumps on nearly every page read. On a healthy drive, retries trigger on fewer than 0.01% of reads. A P/E-exhausted drive forces retries on 5% to 30% of pages, depending on how far past the endurance rating the NAND has been driven. PC-3000 logs each retry event, and the sheer volume across all blocks confirms systemic oxide degradation rather than a localized defect.
Bad Block Table Growth: The controller's bad block table (BBT) grows rapidly once P/E exhaustion sets in. Spare blocks from the over-provisioned pool are consumed at an accelerating rate because new blocks fail faster than old ones did. When the spare pool reaches zero, the controller can no longer remap failures, and the drive enters read-only mode or drops out of detection entirely. PC-3000 reads the BBT directly from the controller's system area to assess how much of the spare pool remains.

P/E Exhaustion By Family07A/18

Which Consumer SSD Families Arrive Worn Out First?

QLC consumer drives reach P/E exhaustion long before TLC drives do, because their rated endurance is lower per the SLC/MLC/TLC/QLC table above. DRAM-less budget QLC drives wear faster still: with no DRAM to cache the mapping table, the FTL journals to NAND more often, & sustained random writes burn the cells sooner.

Wear-out & recovery route are two separate questions. A drive can be worn down to its endurance floor & still sit on a controller PC-3000 SSD cannot talk to, in which case read-retry is off the table. That split comes from the ACELab PC-3000 SSD supported-drives list, not from the NAND itself.

Samsung 870 QVO: QLC on an Unsupported Controller

The Samsung 870 QVO is the clearest example. It is a SATA QLC drive built on Samsung's in-house MKX controller, & QLC is the NAND that exhausts soonest. Modern Samsung in-house controllers are absent from the ACELab PC-3000 SSD supported list, so the read-retry workflow above does not apply to this drive.

Rossmann does not currently offer in-lab recovery for the Samsung MKX controller.

A worn 870 QVO's recovery route is board-level repair to revive the original controller, or chip-off on an unencrypted drive, not a PC-3000 SSD read-retry pass.

Not Every Budget SATA Drive Is QLC

The WD Blue SA510 uses SanDisk 3D TLC NAND, not QLC, & TLC holds up far longer than QLC under the same write load. Mainstream SATA TLC drives are the longer-lived contrast to the QLC bargain tier; they reach the lab for controller & firmware faults more often than for raw cell exhaustion.

QLC NVMe Recovery Depends on the Controller

Budget QLC NVMe drives wear out sooner than mainstream TLC for the same endurance reason. A worn QLC NVMe drive's in-lab recoverability depends on its controller, not its NAND. Drives built on ACELab-supported Silicon Motion or Phison controllers can be accessed with PC-3000 SSD over the SATA or NVMe command set, so read-retry & voltage-threshold shifting are available.

Drives built on in-house or unsupported controllers (modern Samsung, Innogrit, Realtek) are board-repair-only. The path there is reviving the original controller, since those controllers cannot be driven from PC-3000 SSD. Pages claiming PC-3000 recovery for a controller outside the ACELab list would be wrong; we quote those jobs as board repair.

Worn-NAND recovery on a supported controller falls in the firmware tier: $600–$900 for SATA, $900–$1,200 for NVMe. Board-repair-only drives are quoted after the controller fault is identified. Free evaluation, firm quote, no data no fee.

Tunnel Oxide Physics07B/18

What Quantum Mechanism Turns P/E Cycling Into Retention Loss?

The P/E exhaustion fingerprint described above has a specific physical cause. Each program pulse drives electrons through the silicon dioxide tunnel oxide at field strengths above 10 MV/cm, and a fraction of those electrons fracture Si-O bonds inside the dielectric. The resulting atomic defects act as energy traps that lower the effective barrier height of the oxide.

Once enough traps accumulate, the cell starts leaking charge under storage voltages it used to hold indefinitely. This is the physics behind every retention failure that ends up in our queue for SSD data recovery.

Trap-Assisted Tunneling & Stress-Induced Leakage Current

In a fresh NAND cell, the tunnel oxide is a clean insulator. Electrons can only traverse it under the high field of a program or erase pulse, by Fowler-Nordheim tunneling. Cycling generates oxide traps, and once those traps line up across the dielectric they form a stepping-stone path. An electron tunnels from the substrate to a trap, then from that trap to the storage layer, at a small fraction of the energy required for direct tunneling. That two-step path is Trap-Assisted Tunneling (TAT).

The anomalous current it produces under low storage fields is Stress-Induced Leakage Current (SILC), and SILC is the primary driver of retention loss on worn cells.

TCAD simulations on planar floating-gate models place the dominant defect trap energies for neutral oxygen vacancies in the SiO₂ bulk between 1.5 eV and 3.2 eV. 3D charge-trap architectures exhibit shallower traps, often demonstrating distinct peaks between 0.75 eV and 1.25 eV, which lowers the activation energy for TAT & makes leakage worse at equivalent cycle counts.

Gate leakage current is sensitive to trap density: once N_t exceeds roughly 1 × 10¹⁹ cm⁻³ in the tunnel oxide layer, leakage rises sharply & the cell can no longer hold its programmed threshold. For typical tunnel oxide thicknesses near 8 nm, anomalous SILC requires the random alignment of two or three traps to form a conductive percolation path; a single trap is mathematically insufficient to drain the cell, which is why floating-gate NAND retention degrades as a statistical, cycle-dependent process rather than a one-defect event.

Arrhenius Activation Energy & the 10°C Rule

SILC is a thermally activated process, so retention life follows the Arrhenius equation. JEDEC JESD218A models charge detrapping with an apparent activation energy of E_a = 1.1 eV for planar NAND. The acceleration factor between storage temperature T_room & bake temperature T_bake is exp[(E_a / k_B)(1/T_room - 1/T_bake)], with k_B = 8.62 × 10⁻⁵ eV/K.

The general-electronics "10°C doubles the failure rate" rule of thumb only holds when E_a is near 0.6 eV. With JEDEC's 1.1 eV for planar NAND, a 10°C rise from 30°C to 40°C produces an Arrhenius acceleration factor of roughly 3.8x. Heat does not double NAND retention loss; it nearly quadruples it, which is why a drive left in a hot car bleeds threshold voltage far faster than general-electronics intuition suggests.

Planar floating-gate NAND: E_a = 1.1 eV per JESD218A. The Arrhenius model fits well across the consumer storage range & underpins every published retention spec.
3D charge-trap TLC: The single-E_a model breaks down. Early retention loss is super-linear: charge bleeds off rapidly in the first hours after programming, then slows. Researchers model this with Unified self-Recovery & Temperature (URT) frameworks that scale E_a dynamically rather than holding it at 1.1 eV.
3D QLC NAND: Activation energy is non-linear & dominated by trap-to-band tunneling of electrons (TBE) rather than bulk detrapping. Thermal acceleration models pulled from planar parts under-predict failure on QLC at the same temperature, which is why QLC drives sometimes lose data faster than the JEDEC floor would suggest.

Retention vs. P/E Cycle Count

The JEDEC JESD218 client-SSD requirement is 1 year of data retention at 30°C when the drive is at 100% of its rated P/E endurance. The standard fixes that end-of-life floor but does not set statutory retention for intermediate wear states. Mathematical extrapolation of the endurance/retention trade-off yields roughly 10 years of retention when the drive is only at 10% of its rated endurance.

The relationship is inverse: every doubling of P/E cycles roughly halves the time a block can hold its programmed state before RBER overruns the LDPC engine.

P/E Cycles Consumed	JEDEC Client Retention Floor (30°C)
10% of rated	~10 years
50% of rated	~2 years
100% of rated (end of life)	1 year (52 weeks)
150% to 200% of rated (over-cycled)	Days to weeks, no guarantee

Academic results published at FAST & IEEE show the other side of the trade. If a workload tolerates only 3 days of retention, the same TLC cell that died at 3,000 cycles under a multi-year retention guarantee can reach roughly 150,000 cycles before the oxide fails to hold charge for 72 hours. The endurance figure on a datasheet is not a property of the silicon; it is a property of the retention guarantee bolted on top of it.

Cells driven past 150% of their rated cycles do not fail instantly. They fail at retention, sometimes within days of the last write, which is why drives that sat in a drawer for a month arrive at the lab unreadable.

Read-Disturb Thresholds & Controller Scrubbing

Reads damage NAND too. To read a single page, the controller drives a high pass- through voltage (V_pass) onto every unselected wordline in the block so current can flow through the string. V_pass is high enough to soft-program the cells it crosses, shifting their threshold voltages upward by a small amount on every read. The narrower the voltage margins, the fewer reads it takes to corrupt a bit.

NAND Type	Approx. Reads-per-Block Before Scrubbing
SLC	~1,000,000
MLC	~100,000
TLC	~10,000
QLC	A few hundred to a few thousand

The FTL maintains a read counter for every physical block. When the counter trips its threshold (often 10,000 on TLC), the controller invokes Read Reclaim: it reads the block through ECC, writes the corrected data into a fresh block, & queues the old block for erase. This is background work.

It only runs when the controller is alive, powered, & not stalled. If the controller drops out (Phison SATAFIRM S11 state, Silicon Motion 100% busy hang, dead PMIC), Read Reclaim halts.

Any block sitting near its threshold continues accumulating read-disturb damage with no scrubbing, & RBER climbs past the LDPC ceiling within hours to days. That is the link between a dead controller & data that goes from recoverable to unrecoverable over the shipping window. The cells were not "fine until the controller died"; they were counting down, & the scrubber that kept them alive went silent.

These four mechanisms (TAT-driven SILC, Arrhenius-accelerated detrapping, retention collapse past rated cycles, & unscrubbed read-disturb) feed directly into the wear-leveling cascade and FTL collapse sections further down this page. The math changes, but the failure path is the same: oxide damage becomes voltage drift, voltage drift becomes uncorrectable ECC, & uncorrectable ECC becomes a controller that refuses to answer LBA requests.

Write Amplification08/18

How Write Amplification Accelerates Wear

Write amplification is the ratio of physical NAND writes to logical host writes. A write amplification factor (WAF) of 3.0 means the NAND receives 3 bytes of physical writes for every 1 byte the host sends. The Flash Translation Layer (FTL) running on the controller manages this process through garbage collection, wear leveling, and metadata journaling.

Garbage Collection: NAND flash can only be erased in full blocks (typically 256 to 512 pages). When valid and invalid pages are mixed in a block, the controller must copy valid pages to a new block before erasing the old one. This internal copy-and-erase adds write cycles that the host never requested. Small, random writes produce the worst garbage collection overhead because they invalidate individual pages across many blocks.
Wear Leveling: The FTL distributes writes across all NAND blocks to prevent any single block from wearing out prematurely. Dynamic wear leveling moves data between frequently and infrequently written blocks. This spreads the P/E cycle count evenly, but the redistribution itself consumes additional erase cycles.
FTL Journal Writes: The controller maintains a mapping table that translates logical block addresses (LBAs) to physical NAND page locations. Updates to this map are journaled to NAND to survive power loss. On drives without a DRAM cache, the map is written more frequently to flash, adding P/E cycles. This is why DRAM-less SSDs (common in budget models) often degrade faster under sustained random write workloads.

The SMART attribute "Total LBAs Written" (SMART 241) tracks host writes. Comparing this to the NAND-level write count (sometimes exposed as "Total NAND Writes") reveals the actual WAF. A WAF above 5.0 indicates a workload pattern that is consuming NAND endurance at an accelerated rate.

How Does Wear-Leveling Failure Cascade Into FTL Corruption?

Wear leveling is the firmware logic that distributes program/erase cycles across NAND blocks so no single block is exhausted prematurely. Every modern SSD controller implements two layers: dynamic wear leveling, which steers fresh writes toward blocks with lower erase counts, and static wear leveling, which periodically migrates cold data off low-cycle blocks so those blocks become available for hot writes.

When the spare block pool is healthy, both layers operate quietly. When degradation outpaces the spare pool, both layers begin to fail in characteristic ways.

Stage 1: Dynamic Wear Leveling Saturation: Once Available Reserved Space (SMART 170) drops below roughly 10%, the controller has fewer fresh blocks to steer host writes toward. New writes land on blocks that are already near their P/E ceiling. Bit error rates on freshly programmed pages jump within hours of writing, because the cells were already marginal before the write. PC-3000 diagnostics see this as an unusual pattern: recently written LBAs return higher RBER than older, static data.
Stage 2: Static Wear Leveling Stalls: Static wear leveling depends on the controller having somewhere to migrate cold data. With the spare pool depleted, the migration target is itself a worn block. The controller begins migrating data into blocks that fail ECC on the next read. This is the inflection point where uncorrectable error count (SMART 187) begins climbing on data the user never touched. A snapshot of system files written months earlier suddenly returns errors because the controller relocated them during idle time and the destination block was already past usable.
Stage 3: FTL Metadata Block Failure: The FTL mapping table is written far more frequently than user data because every write updates an entry. Controllers reserve specific blocks for FTL journaling. Once those blocks degrade past the soft-decision LDPC ceiling, the controller cannot reliably load its own translator on the next power-up. The drive enters a factory alias state: Phison PS3111 reports as "SATAFIRM S11" with 0MB capacity; Silicon Motion controllers drop to a 1GB or 0MB debug capacity; Samsung drives display a generic model string. User data on the array is intact, but the map needed to locate it is unreadable.
Stage 4: Background Operations Refuse to Halt: Even after the FTL is corrupt, the controller may continue running garbage collection and RZAT enforcement on whatever fragments of the map survived. Every minute the drive remains powered consumes more recoverable blocks. The first instruction PC-3000 SSD issues, after identifying the controller, is the vendor-specific command that halts background operations entirely.

The cascade explains why two seemingly identical drives with the same SMART numbers can have different recoverability. A drive caught at Stage 1 images cleanly with adjusted read parameters. A drive caught at Stage 3 requires firmware loader upload into controller RAM, vendor-mode entry, and a virtual translator rebuild from raw NAND metadata.

The diagnostic depth available depends on the controller family: Phison architecture and Silicon Motion architecture expose different vendor command sets, and the recovery workflow shifts accordingly.

Recovery work at Stage 3 falls in the firmware tier: $600–$900 for SATA SSD, $900–$1,200 for NVMe. If wear damage forces escalation to NAND chip-off on an unencrypted drive, the job moves to $1,200–$1,500 for SATA and $1,200–$2,500 for NVMe.

A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.. +$100 rush fee to move to the front of the queue.

Read Disturb09/18

What Is Read Disturb?

Read disturb is an unintended charge injection into NAND cells caused by read operations on neighboring cells in the same block. Every read applies a pass-through voltage to unselected word lines. Over millions of reads, this accumulated voltage slowly shifts the threshold voltage of unselected cells, eventually flipping bits.

The controller tracks read counts per block and triggers a background scrub (read-refresh) before the disturb accumulation reaches a dangerous level. The scrub reads all pages in the block, corrects any bit errors with ECC, erases the block, and rewrites the corrected data. This consumes one P/E cycle per scrub.

On drives with degraded NAND, the safety margin between the read disturb threshold and the ECC correction limit narrows. A block that could tolerate 500,000 reads on fresh NAND may fail after 100,000 reads on worn NAND. Surveillance systems, database servers, and caching layers that generate sustained read-heavy workloads trigger read disturb faster than consumer desktop usage.

Read Disturb During Recovery Attempts

Running consumer recovery software on a drive with marginal NAND compounds the problem. Each scan pass adds read disturb to every block it touches. A full-surface scan of a 1TB drive reads every page, applying pass-through voltage to every word line in every block.

If the drive is already near its disturb threshold, the recovery scan itself can push cells past the point of no return. This is why the first step in professional recovery is to halt background controller operations and image using controlled, selective reads rather than brute-force full-surface scans.

Read Disturb Thresholds by NAND Geometry

Read disturb tolerance drops with each generation of denser NAND. The reason is voltage margins: more bits per cell means more voltage states packed into the same physical operating window, leaving less room for parasitic charge injection before a bit flips. These are established NAND engineering specifications from published flash characterization literature.

NAND Type	Bits per Cell	Voltage States	Reads Before Scrub Required
SLC	1	2	~1,000,000
MLC (40nm+)	2	4	~100,000
MLC (sub-25nm)	2	4	~20,000
TLC	3	8	10,000 to 40,000
QLC	4	16	<10,000

The practical consequence for recovery: imaging a QLC-based SSD with Disk Drill or R-Studio performs a full-surface read that may exhaust the remaining read disturb margin across dozens of blocks in a single pass. PC-3000 SSD avoids this by reading only the blocks required for the target data, skipping known-bad zones, and monitoring per-block error rates between passes.

If the error rate on a block jumps between passes, the technician knows read disturb is active and can adjust the read strategy to image the most important blocks first.

TLC NAND adds a complication: the Lower, Middle, and Upper pages within a single cell have different susceptibility to read disturb. Lower pages (least significant bit) are the most resilient because the voltage threshold separating their states sits in the widest gap.

Upper pages (most significant bit) sit between the tightest voltage distributions and flip first. PC-3000 can target page types selectively when the controller supports page-level addressing.

Data Retention Failure10/18

Data Retention Failure in Powered-Off SSDs

NAND flash stores data as trapped electrons. Without power, those electrons slowly leak through the tunnel oxide via quantum mechanical tunneling. The rate of leakage follows the Arrhenius equation. With JEDEC's activation energy of 1.1 eV for planar NAND, a 10 degree Celsius rise in storage temperature accelerates leakage by roughly 3.8x, not the 2x that the general-electronics rule of thumb assumes.

On degraded NAND where the oxide is already thinned from P/E cycling, leakage accelerates further.

JEDEC standard JESD218A defines retention requirements. A consumer SSD at end-of-life should retain data for 52 weeks at 30 degrees Celsius. Enterprise SSDs are rated for 3 months at 40 degrees.

These are minimum specifications for new drives at their rated endurance limit. A drive that has exceeded its rated P/E cycles will fall short of these numbers.

NAND Type	Retention at 30°C	Retention at 40°C	Recovery Difficulty
SLC (end-of-life)	Years	12+ months	Low; wide voltage margins
MLC (end-of-life)	12 months	6 months	Moderate
TLC (end-of-life)	52 weeks	~13 weeks	High; 8 states compressed
QLC (end-of-life)	52 weeks	~13 weeks	Very high; 16 states, minimal margins

For recovery, retention-failed drives are candidates for thermal stabilization. Controlled cooling can temporarily slow electron leakage and raise apparent threshold voltages back into a readable range while the technician images the drive through PC-3000.

How Retention Failure Differs from P/E Wear-Out in Diagnostics

Both retention failure and P/E wear-out produce uncorrectable read errors, but they leave distinct signatures in PC-3000 diagnostics. Distinguishing between them determines the recovery approach: thermal stabilization for retention, voltage threshold tuning for wear-out. Misidentifying the failure mode wastes time and can accelerate data loss.

Diagnostic Indicator	P/E Wear-Out	Retention Failure
Vth shift direction	Symmetric broadening (distributions widen in both directions)	Unidirectional downward shift (electrons leak, charge drops)
Error distribution	Uniform across all blocks (wear leveling spreads cycles evenly)	Concentrated in old/static data; recently written blocks read clean
Bad block table growth	Rapid, continuous accumulation	Stable or slow growth; the cells aren't structurally damaged
Temperature sensitivity	Errors persist regardless of ambient temperature	Errors worsen at higher temperatures; cooling improves readability
Cross-temperature effect	Minimal; damage is structural	Severe if data was written warm & read cold (or vice versa)
Recovery approach	Expanded read retry tables, wider voltage offsets via PC-3000	Thermal stabilization (Atten 862 controlled cooling) during imaging

The cross-temperature effect is a practical trap. If an SSD wrote data at 55 degrees Celsius inside a running laptop and now sits in a 20-degree lab, the NAND cells were programmed with one set of voltage thresholds and are being read with a different thermal profile.

The threshold voltages shift with temperature (roughly 1-2mV per degree Celsius on TLC), and this mismatch compounds any retention loss. PC-3000's configurable read voltage offsets can compensate, but the technician needs to know the mismatch exists before selecting the right offset range.

ECC Saturation11/18

What Happens When ECC Correction Capacity Is Exceeded?

Every SSD controller runs an error correction algorithm on each NAND page read. Modern controllers use LDPC (Low-Density Parity-Check) codes, which correct more errors than the older BCH codes used in pre-2016 drives. LDPC operates in two modes: hard-decision decoding (fast, limited correction) and soft-decision decoding (slower, reads each cell at multiple voltage levels for higher accuracy).

The raw bit error rate (RBER) of NAND increases as the cells degrade; on worn TLC NAND, the RBER climbs as the tunnel oxide thins, and the voltage distributions for each cell state overlap more with each P/E cycle. JEDEC mandates an uncorrectable bit error rate (UBER) of 10⁻¹⁵ or better for consumer SSDs. When the RBER exceeds the LDPC correction ceiling required to maintain that UBER, the page is flagged as uncorrectable.

The controller retries the read using different internal voltage offsets, but these retries use the controller's own default retry tables, which are conservative.

PC-3000 SSD goes further. It allows the technician to set custom retry tables with voltage offsets outside the controller's default range, testing additional voltage levels against marginal pages. This is the difference between a consumer drive that declares a page "unrecoverable" and a professional tool that finds a voltage window where the page resolves.

Program Disturb & Recovery Decision Matrix12/18

Program Disturb Thresholds, RBER Progression, & The Recoverable-vs-Terminal Decision

Read disturb gets most of the attention because the read pass voltage is applied billions of times across the life of a drive. Program disturb is the quieter sibling: fewer events, far higher voltages, and a different mitigation path. The two failure modes look similar in the SMART log but require different read-retry strategies on the bench, so the distinction matters before any voltage offset sweep begins.

Program Disturb: Vpass on Unselected Word Lines

During a program operation the selected word line receives a program pulse near 20V (Vpgm) to drive Fowler-Nordheim tunneling into the charge trap of the target cell. To let that current path through the string, every other unselected word line must be held in a conducting state, which requires a pass voltage (Vpass) on those gates. Vendor literature pins the working Vpass window at roughly 10V to 14V for 3D TLC.

That window is a compromise between two failure modes. If Vpass is set too high (above ~14V), the unselected cells in the selected string see enough field to inject electrons inadvertently. This is Vpass-mode disturb.

If Vpass is set too low (below ~10V), the unselected strings cannot boost their channels high enough to inhibit programming, and the potential difference between Vpgm and the poorly-boosted channel causes weak injection on cells that should not be written. This is boosting-mode disturb. Both shift the threshold voltage on cells that were never targeted by the host write.

3D charge-trap NAND adds two architectural wrinkles. Sub-block partitioning introduces Mode Y and Mode XY stress patterns that planar NAND does not see; a cell on a shared word line can be hit n-1 times by neighboring sub-block programs before its own block is erased. The continuous silicon-nitride trap layer also permits lateral charge migration along the pillar, so electrons injected on a stressed cell can drift vertically and alter the Vth of adjacent word lines without any further programming activity.

How Program Disturb Differs From Read Disturb

Property	Program Disturb	Read Disturb
Driving voltage	Vpgm ~20V on selected WL; Vpass 10-14V on unselected	Vread (a few V) on unselected WL during sensing
Event frequency	Once per page program in the block	Once per sensing of any page in the block
Victims	Unselected WLs in the selected & neighboring sub-blocks	Unselected WLs sharing the read string
Reset condition	Cleared by the next block erase	Cleared by the next block erase
Bench signature	Vth shift on cells that the host never wrote in this program cycle	Vth shift concentrated on hot-read pages within a heavily-read block

The recovery implication is direct. Read-disturb damage tends to cluster around hot-read LBAs, so a read-retry sweep on PC-3000 SSD can be biased toward those pages first. Program-disturb damage is distributed across the unselected WLs of the programming sequence, so the voltage offset sweep has to step through the full block instead of targeting a known hot page.

RBER Progression on 3D TLC: From Fresh to Beyond-Rated

Raw bit error rate on planar NAND scales roughly linearly with P/E cycles. 3D TLC does not. IEEE characterization work (Luo, Cai, et al.) shows RBER following a power-law relationship in cycles, modulated by retention time. The headline figures from the literature, expressed as P/E cycle consumption against typical RBER on 3D TLC at short-to-moderate retention:

P/E Cycle State	Typical RBER (3D TLC)	Controller Behavior
Fresh (0% of rated endurance)	Near zero; dominated by sporadic defects	LDPC hard-decision sufficient
~50% of rated endurance	10⁻⁵ to 10⁻⁴ range	LDPC hard-decision sufficient
100% of rated endurance (1,000-3,000 cycles consumer TLC)	10⁻⁴ to 10⁻³	Hard-decision near ceiling; soft-decision begins triggering
~200% of rated endurance (experimental over-cycling)	Approaching 10⁻²	Soft-decision LDPC at its mathematical ceiling

Two thresholds anchor recovery decisions. LDPC hard-decision decoding fails near RBER 10⁻³, the point where the decoder either hits an error floor or fails to converge. Soft-decision LDPC, which reads each cell at multiple shifted voltage references and feeds log-likelihood ratios back into the decoder, holds up to roughly RBER 10⁻² (a 1% raw bit error rate). Past that, the controller logic alone cannot resolve the page.

Layer-to-layer variance complicates the picture. The deep trench etch used for BiCS and V-NAND pillars produces different hole diameters at the top, middle, and bottom of the stack, so RBER is not uniform across word lines in the same block.

Published measurements on a 96-layer 3D TLC part (Huang, Wu, et al.) show the weakest layer reaching an RBER up to 49.2x the strongest layer after only 1,000 P/E cycles, expanding on the earlier Luo, Cai, et al. characterization of 30-to-40 layer parts that documented a roughly 21x variance. On the bench this means a single voltage offset will not rescue an entire block; the read-retry sweep has to be layer-aware, isolating the WLs that crossed the LDPC ceiling first.

Retention compounds wear. JEDEC JESD218A specifies a 52-week client retention target at end of rated life. A drive over-cycled to 200% of rated endurance can read clean moments after programming and still cross the uncorrectable threshold within weeks of powered-off storage, because the tunnel oxide damage accelerates shallow-trap de-trapping near the storage interface.

Recovery scheduling matters: a drive described by the customer as "the SSD that has been sitting in a drawer for a year" behaves very differently from one pulled the same day.

Recoverable vs Terminal: Decision Matrix

The two physics curves above (Vpass-induced Vth shifts, and the LDPC ceiling at RBER 10⁻²) intersect with two hardware conditions (controller alive or dead, drive encrypted or not) to produce a finite set of recovery pathways. This matrix is what the bench actually uses when triaging a drive.

Degradation Mode & Severity	Controller State	Encryption	Recovery Method
Program or read disturb; RBER below 10⁻³; default ECC failing on specific pages	Alive, firmware healthy	Any	In-situ PC-3000 SSD read-retry with custom voltage offsets outside the controller's default table; original LDPC engine handles decoding
Worn TLC at 100-200% rated endurance; RBER between 10⁻³ and 10⁻²	Alive, firmware healthy	Any	In-situ PC-3000 SSD layer-aware voltage sweep; rely on controller's soft-decision LDPC and AES pipeline; image to disk before further degradation
Firmware panic (SATAFIRM S11, 0 MB, 2 MB capacity, BSY); NAND wear any level	Hardware alive; microcode corrupted	Any (controller decrypts when revived)	PC-3000 SSD Technological Mode: vendor-specific commands load a volatile LDR into controller SRAM, bypass the corrupted FTL, then read through the original LDPC and AES pipeline
Any wear state	Dead controller (shorted PMIC, blown rail, no enumeration)	AES-256 hardware-bound	Board-level repair only: FLIR thermal localization, Hakko FM-2032 with Atten 862 hot air for component replacement, Zhuo Mao BGA rework if the controller package itself needs reflow. Chip-off yields ciphertext.
Any wear state	Controller silicon destroyed	AES-256 hardware-bound	Unrecoverable. The Media Encryption Key is bound to the destroyed controller's hardware-unique root and never left it; raw NAND reads return AES-256 ciphertext with no path to plaintext.
RBER above 10⁻² on cleartext drive	Controller silicon destroyed	None / disabled	Chip-off NAND raw read with external FTL and XOR reconstruction; yield is limited by the absence of the controller's calibrated layer-specific read-retry tables
RBER below 10⁻² on cleartext drive	Controller silicon destroyed	None / disabled	Chip-off NAND raw read with external ECC reconstruction; generic or vendor FTL maps applied in software

The pattern under the table is straightforward. If the original controller can be revived, the LDPC engine and the AES pipeline come back with it, and the recovery is a voltage-offset problem. If the controller is permanently dead and the drive is encrypted, the only honest answer is board-level repair or no recovery at all. Chip-off remains useful but is reserved for drives whose firmware predates always-on hardware encryption, or for non-encrypted industrial parts.

SATA SSD recovery falls within the $200–$1,500 range depending on which row of the matrix applies. NVMe SSDs span $200–$2,500 for the same reason: a Technological Mode firmware case is priced differently from a dead-PMIC board repair, and both are priced differently from a chip-off with external ECC reconstruction. No data, no recovery fee applies across all pathways.

SMART Attributes Table13/18

SMART Attributes That Indicate NAND Degradation

SMART monitoring provides early warning of NAND degradation. Not all controllers expose the same attributes, and interpretation varies by manufacturer. The table covers Reallocated Sector Count, Available Reserved Space, SSD Wear Leveling Count, Uncorrectable Error Count, Percentage Lifetime Used, Media Wearout Indicator, and Total LBAs Written.

SMART ID	Attribute	Concern Threshold	What It Means
5	Reallocated Sector Count	Any non-zero value	NAND blocks retired to the spare pool. Rising count means active degradation.
170	Available Reserved Space	Below 10%	Spare block pool nearly exhausted. No room to remap further failures.
173	SSD Wear Leveling Count	Vendor-specific	Average P/E cycle count across all blocks. Compare to rated endurance.
187	Uncorrectable Error Count	Any non-zero value	Errors that exceeded the controller's ECC capacity. Direct evidence of degradation past the correction limit.
202	Percentage Lifetime Used	Above 90%	Counts up from 0 to 100. Values above 90% indicate the tunnel oxide is near end-of-life.
233	Media Wearout Indicator	Below 10	Counts down from 100 to 0. Near-zero values mean the NAND has consumed its rated endurance.
241	Total LBAs Written	Compare to TBW rating	Total host writes. If approaching or exceeding the manufacturer's TBW rating, expect degradation.

SMART data is a guide, not a guarantee. Some drives fail from firmware bugs or power events with perfect SMART readings. Others exceed their rated TBW by 2x with no issues.

SMART helps the lab estimate how much read retry tuning the recovery will require.

PC-3000 Recovery Workflow For Degraded NAND14/18

PC-3000 SSD Recovery Workflow for Degraded NAND

The PC-3000 SSD module provides controller-specific access to the internal firmware command set. For degraded NAND, the primary capabilities are read retry table manipulation and direct NAND page addressing. The workflow varies by controller family, but the core approach is consistent across Phison, Silicon Motion, Samsung, and Marvell platforms.

Halt background operations. PC-3000 sends vendor-specific commands to disable garbage collection, wear leveling, and TRIM execution. On Phison controllers, this is done through the vendor-specific SATA/NVMe command set that enters "vendor mode." Silicon Motion controllers use a separate "ISP mode" entry. This prevents the controller from erasing blocks or rewriting the FTL during imaging.
Read retry table expansion. The controller's default read retry table contains a fixed set of voltage offsets it applies when a page read fails ECC. PC-3000 replaces this table with an expanded set that tests more voltage levels across a wider range than the controller would attempt on its own.
Soft-decision read activation. For controllers that support it, PC-3000 forces the LDPC decoder into soft-decision mode, where each cell is read at 3-7 voltage levels instead of a single threshold. The probability distribution of each bit state feeds the LDPC decoder, which achieves higher correction rates than hard-decision reads. This is the same technique the controller uses internally, but PC-3000 makes the read voltages configurable.
Block-by-block imaging. Rather than a sequential full-surface read, PC-3000 images blocks categorized by their error rate. Low-error blocks image first (fastest, highest yield). Marginal blocks are imaged with progressively more aggressive retry settings. Unreadable blocks are flagged for thermal-assisted reads or skipped entirely if no voltage window resolves them.
Composite image assembly. Sectors recovered across all passes and parameter sets are merged into a single image. Cross-references between the FTL map and physical NAND addressing resolve logical-to-physical mapping for any sectors read outside the normal controller pipeline.

When degraded NAND is compounded by thermal sensitivity, the workflow integrates with thermal stabilization techniques. The technician applies controlled temperature changes to the NAND packages using hot air rework equipment (Atten 862) while monitoring sector error rates through PC-3000, imaging at the temperature that produces the lowest RBER for each block.

Degradation Severity & Recovery Methodology15/18

How Does Degradation Severity Map to Recovery Methodology?

NAND degradation isn't binary. The severity of cell wear determines which recovery tools apply and which pricing tier the job falls into. PC-3000 SSD diagnostics categorize a drive's condition into one of three severity levels, each requiring a different technical approach and falling into a different cost range.

Severity 1: Mild Degradation (Firmware-Level Imaging)

The drive is still detected by the host system. The FTL is intact, SMART reports accumulating errors, and read speeds have dropped. The controller's internal ECC still resolves most pages, but marginal sectors cause I/O timeouts that crash consumer software.

Recovery approach: PC-3000 SSD images the drive with adjusted read timeout thresholds, disables background system area logging (which consumes P/E cycles during imaging), and limits the controller's retry count to prevent read disturb accumulation. This is a controlled hardware imaging job. SATA SSD firmware recovery runs $600–$900; NVMe runs $900–$1,200.

Severity 2: Moderate Degradation (FTL Rebuild & Voltage Tuning)

The FTL is corrupted. The drive reports a factory alias: Phison drives show "SATAFIRM S11" with 0MB capacity, Silicon Motion controllers drop to a 1GB or 0MB debug capacity, Samsung drives display a generic model string with no partition table. User data is physically present on the NAND, but the mapping table needed to locate it is gone.

Recovery approach: PC-3000 enters the controller's vendor-specific safe mode, uploads a firmware loader into RAM, and scans raw NAND blocks for surviving metadata markers. The tool performs a virtual FTL rebuild from the raw NAND page metadata. Custom read retry tables with expanded voltage offsets resolve marginal pages that the controller abandoned.

This still falls in the firmware tier: $600–$900 for SATA, $900–$1,200 for NVMe.

Severity 3: Severe Degradation (Thermal-Assisted Reads & Chip-Off)

The FTL has been rebuilt, but massive uncorrectable sectors remain. PC-3000's read retry algorithms can't find a viable voltage window for large portions of the NAND. The raw bit error rate exceeds the LDPC soft-decision correction limit across entire die regions.

Recovery approach: thermal-assisted reads using Atten 862 hot air applied to the NAND packages while PC-3000 monitors RBER per block. Temperature changes shift the threshold voltage distributions and can temporarily open a read window on cells that are unreadable at ambient temperature. If thermal reads can't resolve the data on an unencrypted SATA drive, chip-off extraction with external ECC recalculation is the last resort.

This moves the job into the NAND swap tier: $1,200–$1,500 for SATA, $1,200–$2,500 for NVMe. 50% deposit required; donor drive cost additional.

On drives where hardware AES is actually enabled and bound to the controller (TCG Opal or SED drives with encryption active), chip-off isn't an option. The AES-256 key is generated on and bound to the original controller, so it never leaves it. If the controller can't be revived through board-level repair with FLIR thermal fault localization and Hakko FM-2032 microsoldering, the data is unrecoverable.

Severity	Symptoms	PC-3000 Diagnostic Signs	Recovery Method	Pricing Tier
Mild	Detected by OS, slow reads, rising SMART errors	ECC retries on 5-10% of pages, BBT growing slowly	Hardware imaging with adjusted timeouts	Firmware: SATA $600–$900, NVMe $900–$1,200
Moderate	Factory alias (SATAFIRM S11, 0MB/1GB), FTL corrupted	FTL metadata pages unreadable, 10-30% page retries	Virtual FTL rebuild + custom read retry tables	Firmware: SATA $600–$900, NVMe $900–$1,200
Severe	FTL rebuilt but massive uncorrectable sectors remain	RBER exceeds LDPC soft-decision limit, retry tables exhausted	Thermal-assisted reads, chip-off (unencrypted only)	NAND swap: SATA $1,200–$1,500, NVMe $1,200–$2,500

Free evaluation determines severity before quoting. +$100 rush fee to move to the front of the queue. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.

Controller-Specific Degradation Handling16/18

How Do Major SSD Controller Families Handle NAND Degradation?

Every SSD controller family implements its own NAND management & error correction strategy. When these systems fail or the degradation exceeds their design limits, PC-3000 SSD must replicate, override, or bypass the controller's internal logic to extract data. Understanding what the controller was doing when it failed determines the recovery approach.

Silicon Motion (SM2259, SM2262EN, SM2269XT)

Silicon Motion's NANDXtend ECC engine uses multi-layer LDPC decoding. Hard-decision decoding runs first as the fast path. If that fails, the controller escalates to soft-decision decoding, reading each cell at multiple voltage levels & feeding the probability distribution to the LDPC decoder.

A RAID-like parity layer provides a third redundancy level across die groups.

SMI controllers also run IntelligentScan, a background process that proactively scrubs & repairs degrading blocks before they reach the ECC correction limit. Internal health monitoring registers track block-level wear metrics beyond what standard SMART exposes. PC-3000 SSD's Silicon Motion utility reads these internal registers directly.

Recovery implication: when an SM2259-based drive (WD Green, Crucial BX500) fails with corrupted NAND, PC-3000 must replicate or override the multi-layered NANDXtend ECC rather than a simple single-pass BCH decode. The tool's SMI utility supports entering ISP mode, which provides raw NAND access below the controller's ECC layer, allowing the technician to apply custom correction parameters.

Phison (PS5012, PS5016, PS5019)

Phison controllers implement SmartRefresh, a two-stage background refresh mechanism. The first stage, Dynamic Error Bit Monitoring (DEBM), tracks the correctable error bit count per block. When a block's error count approaches the ECC threshold, DEBM flags it for scrub.

The second stage, Idle-Time Media Scan (ITMS), performs the actual scrub during controller idle periods, reading the flagged block, correcting errors, erasing the block, & rewriting the corrected data.

Phison's SmartFlush manages DRAM cache to NAND flush timing. During normal operation, the FTL mapping table resides in DRAM & is periodically flushed to NAND. SmartFlush coordinates these writes to protect the FTL during unexpected power loss.

A Phison drive that fails during a SmartRefresh scrub or a SmartFlush operation can have partially-relocated blocks: the source block was erased before the destination block write completed.

Recovery implication: PC-3000 SSD's Phison utility enters vendor mode and scans for both source & destination copies of relocated blocks. On PS5012-E12 based drives (Sabrent Rocket, Corsair MP510, Inland Premium), the tool reads the block allocation bitmap to identify in-flight relocations and selects the more complete copy for each logical page. This is Phison-specific logic; the same approach doesn't work on Samsung or Silicon Motion controllers.

Samsung (Legacy SATA Controllers)

Samsung's controllers cycle through 16 internal read retry modes when an uncorrectable sector appears. Each mode applies a different voltage reference offset, effectively testing 16 different interpretations of the degraded cell's charge level. If all 16 modes fail to resolve the page, the controller forces a disconnect or reboot rather than returning corrupt data.

This auto-disconnect behavior makes consumer imaging tools useless on degraded Samsung drives. Disk Drill or EaseUS triggers the read, the controller tries all 16 modes, fails, disconnects the drive from the bus, and the software reports a device removal error. The user reconnects and tries again; the same block triggers the same disconnect cycle.

PC-3000 SSD provides vendor-specific diagnostic access to supported Samsung controllers, including configurable read retry parameters. The tool disables the auto-disconnect behavior, holds the controller in a stable diagnostic state, and applies custom voltage offsets beyond the 16 built-in modes. On drives like the Samsung 840 EVO, this approach recovers pages that the controller declared unreadable by testing voltage windows the built-in retry table doesn't cover.

FTL Collapse17/18

FTL Metadata Corruption from NAND Degradation

The Flash Translation Layer (FTL) is the firmware mapping table that translates logical block addresses (how the operating system sees files) to physical NAND page locations (where the electrons are stored). The controller updates this map constantly. Because the FTL metadata pages are written far more often than user data, they degrade faster than the rest of the NAND.

When the NAND blocks storing the FTL degrade past the ECC correction threshold, the drive loses its map. The user data is still physically present on the NAND chips, but the controller can no longer locate it. The controller enters a diagnostic state: Phison PS3111-based drives report as "SATAFIRM S11" with 0MB capacity; Silicon Motion controllers drop to a 1GB or 0MB debug capacity; Samsung drives may show a generic "SAMSUNG" model string with no partition table.

This is the most common manifestation of NAND degradation in the lab. PC-3000 SSD handles it by uploading a firmware loader into the controller's RAM, bypassing the corrupted on-NAND firmware. The loader provides direct access to the physical NAND blocks.

PC-3000 then scans the raw blocks, locates surviving metadata markers, and reconstructs a virtual translator. The virtual map replaces the corrupted FTL, allowing the data to be imaged sector by sector to a target drive. See our firmware corruption recovery page for controller-specific details.

Wear-Leveling Collapse & Read-Only Lockout17B/18

Why Does a Worn SSD Drop Into Read-Only Mode?

A worn SSD drops into read-only mode when its spare-block pool reaches the firmware floor and program/erase operations keep failing. The controller stops accepting writes to freeze the translator in its last good state and protect whatever data still resolves, so the drive mounts but rejects every save, delete, or format.

Read-only is a hardware-level firmware defense, not an operating-system permissions bug or a file-system flag you can clear. Disk management, diskpart, and the "remove write protection" registry edits people try first all talk to the controller, and a controller in this state declines the write no matter who asks. The decision is made below the OS.

How Does Degraded NAND Corrupt the Inputs to Wear Leveling?

The wear-leveling algorithm is only as good as the bookkeeping it reads. The FTL map, the bad-block table, and the per-block erase counters live in a reserved service area on the NAND itself, not in a separate ROM, so the same degradation that wears user data eventually wears the metadata that steers the leveling logic.

Once uncorrectable bit errors land in that service area, the controller reads corrupted erase-counters and stale valid-page bitmaps, then makes wrong decisions from them. It routes fresh writes toward blocks it believes are healthy but are already near their P/E ceiling, and it prematurely relocates cold data off blocks that did not need migrating. The cascade earlier on this page describes how that SSD data recovery workflow looks under PC-3000 surface scans.

Effective leveling needs a buffer of known-good free blocks. When the controller loses reliable block-health tracking, it falls into a relocation loop: each wrong move burns another erase cycle on already-marginal blocks, which produces more bad metadata, which drives more wrong moves. That loop accelerates FTL map churn until the translator becomes incoherent and the firmware can no longer load its own map on the next power-up. The decision logic itself is what we trace through wear leveling recovery, separate from raw cell wear.

What Triggers the Firmware to Lock Writes?

The controller does not lock writes on a timer. It locks them when its own defense mechanisms run out of room. Three functional conditions push it there: the reserved spare-block pool hits the manufacturer-defined critical floor, the bad-block table overflows from cumulative wear, and program or erase operations start failing persistently rather than intermittently.

When those conditions stack up, the firmware halts normal writes and falls back to a hardcoded safe-mode identity or an emergency read-only lock, both meant to freeze the FTL before more of it is overwritten.

On the SMART side, this tracks with Percentage Lifetime Used (SMART 202) climbing toward 100, which counts the percentage of the drive's rated write endurance that has been consumed. Reaching 100% used does not flip the read-only switch by itself; the lock fires on the reserve-pool floor and persistent P/E failure described above. The reserve-space and wear-leveling attributes (SMART 170 and 173) and the wearout indicator (SMART 233) report the underlying depletion, but no single published register value is the documented trigger.

If your drive is in this state, power it off and leave it off. Do not format it, do not run an erase or "clean" command, and do not keep force-mounting a read-only volume; each attempt feeds the relocation loop and can push surviving service-area metadata past recovery.

A worn drive that locks read-only usually falls in the firmware tier, $600–$900 for SATA SSD and $900–$1,200 for NVMe, because PC-3000 SSD has to enter vendor mode and rebuild the translator from surviving metadata. If the controller is also dead, that becomes firmware corruption recovery after board repair. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.. +$100 rush fee to move to the front of the queue.

Faq18/18

Frequently Asked Questions

Can data be recovered from a worn-out SSD?

Yes, provided the NAND cells still retain enough charge for the PC-3000 SSD to resolve voltage states using read retry and threshold voltage shifting. Standard recovery tools rely on the controller's default read settings, which fail once bit errors exceed the built-in ECC capacity. PC-3000 bypasses the controller's read pipeline and adjusts voltage reference levels directly, recovering data from cells that the controller has already given up on. Recovery pricing for SATA SSDs ranges from $200–$1,500; NVMe SSDs range from $200–$2,500.

What causes NAND flash to degrade?

Every program/erase cycle damages the tunnel oxide layer that traps electrons in NAND cells. SLC NAND tolerates roughly 100,000 P/E cycles before degradation becomes measurable. MLC drops to 3,000-10,000 cycles, TLC to 1,000-3,000, and QLC to 100-1,000. Write amplification from the SSD's internal garbage collection, wear leveling, and TRIM operations means the NAND sees more writes than the host system sends. A drive rated for 300 TBW (terabytes written) at the host level may consume its P/E budget well before reaching that figure if write amplification is high.

How do I know if my SSD's NAND is degraded?

Check SMART attributes using CrystalDiskInfo or smartmontools. Key indicators: Media Wearout Indicator (SMART 233) near zero, Percentage Lifetime Used (SMART 202) above 95%, depleted Available Reserved Space (SMART 170), and high Reallocated Sector Count (SMART 5). The drive may also become intermittently slow, drop to read-only mode, or fail to boot the operating system while still being detected in BIOS.

Why does consumer recovery software fail on degraded SSDs?

Consumer software (Disk Drill, EaseUS, R-Studio) sends standard read commands through the operating system. If the SSD controller cannot resolve a page because bit errors exceed its ECC capacity, it returns an I/O error to the OS. The software has no mechanism to adjust the controller's internal read retry count or voltage reference levels. PC-3000 SSD communicates with the controller through vendor-specific diagnostic commands, bypassing the standard read pipeline entirely.

What is read disturb and how does it cause data loss?

Read disturb is an unintended side effect of reading NAND flash. Every read operation applies a voltage to the selected word line. Adjacent, unselected cells absorb a fraction of that voltage. Over millions of reads, this accumulates enough charge to shift the threshold voltage of neighboring cells, flipping bits. The effect is cumulative and irreversible without erasing and reprogramming the block. Drives that serve heavy read workloads (database servers, surveillance systems) are most vulnerable.

How long can a powered-off SSD retain data?

JEDEC standard JESD218A specifies that a consumer SSD at end-of-life retains data for 52 weeks at 30 degrees Celsius storage temperature. At 40 degrees, retention drops to roughly 13 weeks under the Arrhenius model (activation energy 1.1 eV for planar NAND). Enterprise SSDs are rated for 3 months at 40 degrees. TLC and QLC cells retain charge for shorter periods than SLC or MLC because they store more bits per cell with narrower voltage margins. A drive stored in a hot environment (attic, parked car) loses data faster than one stored at room temperature.

How much does NAND degradation recovery cost?

SATA SSD recovery ranges from $200–$1,500. NVMe SSD recovery ranges from $200–$2,500. Degraded NAND typically falls into the firmware recovery tier ($600–$900 for SATA, $900–$1,200 for NVMe) because it requires PC-3000 low-level reads with custom read retry parameters. If the controller is also damaged, board-level repair adds circuit board costs. Free evaluation, firm quote before work begins. No data, no fee. +$100 rush fee to move to the front of the queue.

Does TRIM accelerate NAND degradation?

TRIM itself does not degrade NAND. TRIM tells the controller which logical blocks are no longer in use, allowing the controller to erase those physical blocks during garbage collection. Erasing a block consumes one P/E cycle. The indirect effect: aggressive TRIM combined with frequent file creation and deletion increases the rate of block erases. The direct accelerator of NAND degradation is write amplification from garbage collection, not the TRIM command itself.

Can a degraded SSD that has dropped into TRIM/RZAT lockup still be recovered?

When a worn drive enters Deterministic Read Zero after TRIM (RZAT) lockup, the controller returns zeros for any LBA mapped to a trimmed-but-not-yet-erased block. The user data may still be physically present on the NAND array if garbage collection hasn't completed the block erase, but the controller refuses to expose it through the standard read pipeline. PC-3000 SSD bypasses the FTL by entering vendor diagnostic mode on the controller and reading raw NAND pages directly, then reconstructing the logical-to-physical map from surviving metadata. Recoverability depends on how many trimmed blocks have already been physically erased by background garbage collection. Power the drive off the moment you suspect data loss; every minute of idle power draw consumes more recoverable blocks. SATA SSD recovery ranges from $200–$1,500; NVMe SSD recovery ranges from $200–$2,500. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.

Why did my SSD suddenly go read-only?

A sudden switch to read-only on a worn SSD means the controller has run out of room to manage wear. When the reserved spare-block pool hits the firmware floor and program/erase operations keep failing, the firmware halts writes and freezes the translator to protect data that still resolves. This is a hardware-level firmware defense, not a Windows or macOS permissions error, so registry edits and "remove write protection" steps cannot clear it. Power the drive off and avoid formatting or force-mounting it; each write attempt can push surviving service-area metadata past recovery. Worn read-only drives usually fall in the firmware tier ($600–$900 for SATA, $900–$1,200 for NVMe) because PC-3000 SSD must enter vendor mode and rebuild the translator from surviving metadata.

Does my SSD's controller brand change whether degraded NAND can be recovered?

Yes. Controller architecture determines which diagnostic interfaces are available, which read retry mechanisms can be overridden, and whether wear-leveling metadata can be reconstructed. Phison controllers (PS5012, PS5016) expose vendor mode through their SATA/NVMe command set, allowing PC-3000 SSD to upload a volatile microcode loader into SRAM and rebuild the translator from surviving NAND metadata. Silicon Motion controllers (SM2259, SM2262EN) require Safe Mode (ROM Mode) entry to bypass corrupted firmware and upload a diagnostic loader into controller RAM. Legacy Samsung controllers limit internal read retries on degraded pages, which is why consumer imaging tools fail; PC-3000 SSD issues vendor-specific commands to adjust NAND read voltage thresholds and extract marginal data through the controller. Drives built on controllers without published vendor-mode support are markedly harder to recover from heavy wear, and chip-off extraction is the fallback only on unencrypted SATA drives.

No Data, No Fee

Guarantee

2.49M+

Subscribers

4.9

1,837+ Google Reviews

Since 2008

Established

Repairs on Video

Full Transparency

As Featured In

BBC News