What Is Wear Leveling, and Why Does It Cause Data Loss?
Every NAND cell in your SSD can only be erased and rewritten a limited number of times before it wears out. Wear leveling is the controller's way of spreading those writes across all the cells so they age together instead of one spot burning out first. It is the reason a modern SSD lasts years instead of weeks.
The problem is what happens at the end. Wear leveling delays failure; it does not prevent it. Once enough cells reach their cycle limit, the controller starts moving failures to a hidden pool of spare blocks. That pool is finite. When it empties, the controller has nowhere left to put failures, and the drive stops behaving normally.
You see this as a drive that suddenly turns read-only, slows to a crawl, corrupts files without warning, or vanishes from the operating system while still showing up in BIOS. The data is usually intact on the chips. What broke is the controller's ability to manage and locate it, because the wear-leveling and mapping machinery ran out of room.
Consumer recovery software cannot fix this. It sends ordinary read commands and waits for the controller to answer. A worn controller that has dropped to read-only or lost its map returns errors or zeros, and no program running on top of the operating system can change how the controller reads degraded cells. Professional tools like PC-3000 SSD talk to the controller through diagnostic channels and change those read parameters directly. This sits alongside our NAND degradation recovery work, since wear-out is what drives both.
How Do You Know Wear Leveling Has Failed?
Wear-leveling failure announces itself before total death. A drive whose spare pool is draining shows a recognizable pattern: read-only lockout, long pauses on read, silent file corruption, or a drive that appears in BIOS while the file system refuses to mount.
- ●Read-only lockout. The drive switches itself to read-only the moment the reserved block pool runs dry. Files are visible but nothing can be saved, deleted, or changed. This is the single most common state that arrives for SSD data recovery on heavily-written consumer drives.
- ●Stalls on read. The drive freezes for seconds at a time as the controller retries failing pages and hunts the spare pool for a remap target. Uncorrectable Error Count (SMART 187) climbs alongside the slowdown.
- ●Silent file corruption. Files open but contain garbled data, truncated images, or zero-filled regions. The controller returned data from a worn cell whose voltage state it misread, and ECC could not fully correct it.
- ●BIOS sees it, the OS does not. The drive enumerates with the right model and capacity, but the operating system cannot mount the file system. The wear-leveling tables or FTL metadata blocks degraded past correction, so the controller cannot rebuild its map.
- ●SMART wear flags. CrystalDiskInfo or smartmontools reports a depleted Available Reserved Space (SMART 170), a Percentage Lifetime Used (SMART 202) indicating exhaustion (above 90% or dropping below 10%, depending on the vendor), or an SSD Wear Leveling Count (SMART 173) near its rated ceiling.
If the drive shows any of these, power it off. Continued reads accelerate garbage collection and can trigger block erases that turn recoverable data into permanently lost data.
How We Recover Data After Wear-Leveling Failure
Wear-leveling recovery is a firmware-level and board-level process, not a mechanical one. SSD recovery happens at the soldering station and through diagnostic firmware access. PC-3000 SSD enters the controller's diagnostic mode and reads worn NAND with parameters the controller would never apply on its own.
- Free evaluation. We read the drive's SMART data, identify the controller, and determine whether the failure is firmware-level or also involves a dead board. You receive a firm quote before any work begins.
- Halt background operations. The first diagnostic command stops garbage collection, wear-leveling relocation, and TRIM enforcement so the controller stops erasing recoverable blocks while we work.
- Baseline error mapping. A surface scan classifies every block as readable, marginal, or unreadable at default settings, revealing how far the wear has spread.
- Read-retry and voltage tuning. For marginal and unreadable blocks, PC-3000 expands the read-retry table and shifts voltage reference thresholds, testing voltage windows the controller's conservative defaults never reach.
- Multi-pass imaging. The drive is imaged across several passes, each tuned to a different error threshold. Blocks recovered in later passes fill the gaps left by earlier ones.
- FTL reconstruction and delivery. The composite image is assembled, the logical-to-physical map is rebuilt where the wear table was lost, and the file system is parsed. You receive a file listing before final delivery to a new drive.
If the controller itself is dead, not just worn, board-level microsoldering comes first. We localize the failed component with FLIR thermal imaging, replace shorted voltage regulators or PMICs with a Hakko FM-2032, and reflow the controller package on a Zhuo Mao BGA station when needed. On a drive with hardware encryption active, this is the only path: the key is bound to the original controller, so the board has to live again before any data is readable.
SSD Recovery Pricing
Wear-leveling recovery is covered by our standard SSD recovery pricing tiers. A worn drive that still loads its FTL and images with adjusted read parameters usually lands in the firmware recovery tier. SATA SSD recovery ranges from $200–$1,500; NVMe SSD recovery ranges from $200–$2,500.
Free evaluation, firm quote, no data = no charge. +$100 rush fee to move to the front of the queue. Cases that require a donor PCB carry an additional donor cost (A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.).
Low complexity
Simple Copy
Your drive works, you just need the data moved off it
Functional drive; data transfer to new media
Rush available: +$100
$200
3-5 business days
Low complexity
File System Recovery
Your drive isn't showing up, but it's not physically damaged
File system corruption. Visible to recovery software but not to OS
Starting price; final depends on complexity
From $250
2-4 weeks
Medium complexity
Circuit Board Repair
Your drive won't power on or has shorted components
PCB issues: failed voltage regulators, dead PMICs, shorted capacitors
May require a donor drive (additional cost)
$450–$600
3-6 weeks
Medium complexity
Most Common
Firmware Recovery
Your drive is detected but shows the wrong name, wrong size, or no data
Firmware corruption: ROM, modules, or system files corrupted
Price depends on extent of bad areas in NAND
$600–$900
3-6 weeks
High complexity
PCB / NAND Swap
Your drive's circuit board is severely damaged and requires NAND chip transplant to a donor PCB
NAND swap onto donor PCB. Precision microsoldering and BGA rework required
50% deposit required; donor drive cost additional
50% deposit required
$1,200–$1,500
4-8 weeks
Hardware Repair vs. Software Locks
Our "no data, no fee" policy applies to hardware recovery. We do not bill for unsuccessful physical repairs. If we replace a hard drive read/write head assembly or repair a liquid-damaged logic board to a bootable state, the hardware repair is complete and standard rates apply. If data remains inaccessible due to user-configured software locks, a forgotten passcode, or a remote wipe command, the physical repair is still billable. We cannot bypass user encryption or activation locks.
No data, no fee. Free evaluation and firm quote before any paid work. Full guarantee details. NAND swap requires a 50% deposit because donor parts are consumed in the attempt.
- Rush fee
- +$100 rush fee to move to the front of the queue
- Donor drives
- A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.
- Target drive
- The destination drive we copy recovered data onto. You can supply your own or we provide one at cost plus a small markup. All prices are plus applicable tax.
Estimate Your SSD Recovery Cost
Select your symptoms and drive type for a preliminary cost range. Final pricing comes after a free evaluation at our Austin, TX lab.
What type of SSD do you have?
This determines the recovery method and pricing.
Not sure which type you have? Call (512) 212-9111 and we can help identify it.
How Wear Leveling Works: Static, Dynamic, and the Wear-Leveling Table
Wear leveling lives inside the Flash Translation Layer, the firmware that maps logical block addresses the operating system uses to the physical NAND pages where electrons actually sit. Because the FTL already brokers every read and write, it is the natural place to decide which physical block receives each write, and that decision is wear leveling.
The controller maintains a wear-leveling table: a per-block record of erase counts. Every erase increments a counter. The leveling logic reads those counters to pick targets, and the table itself is stored in NAND and updated constantly, which matters later when wear damages the table's own blocks.
- Dynamic Wear Leveling
- When the host writes new data, the controller pulls an erased block from its free pool and chooses the one with the lowest erase count. This keeps fresh writes off the most-worn blocks. Dynamic leveling only touches blocks that are already in play; it never disturbs data that is sitting still, which is both its strength and its blind spot.
- Static (Global) Wear Leveling
- Cold data that never changes (the operating system, installed programs, archives) would otherwise sit forever on low-cycle blocks while hot blocks burn out around it. Static wear leveling periodically relocates that cold data so its low-cycle blocks rejoin the writable pool. The relocation costs extra erase cycles up front in exchange for a longer overall life, and it is the layer that fails most visibly once the spare pool shrinks.
- Reserved and Spare Block Pool
- Behind the visible capacity sits an over-provisioned reserve of spare blocks. When a cell fails, the controller retires its block and remaps the logical address to a spare. The wear-leveling table and the bad block table track this. The reserve is the buffer that keeps the drive healthy as cells age, and its depletion is the trigger for read-only lockout.
- FTL Mapping
- Every relocation, every remap, every leveling move updates the FTL so the logical address still resolves to the right physical page. This map is written far more often than user data, so the blocks holding it accumulate wear faster than the rest of the array. When those blocks fail, the drive can no longer find its own data.
These four pieces work in concert while the drive is healthy. The recovery-relevant question is what happens to each one when the cells they manage start failing faster than the spare pool can absorb.
How Uneven Cell Exhaustion Turns Into Read Failures
Every program/erase cycle forces electrons through the tunnel oxide layer that insulates each cell's charge trap. Each pass leaves trapped charge in the oxide and weakens the dielectric. Over enough cycles the oxide can no longer hold a clean charge, the threshold voltage distributions for each stored state widen, and they begin to overlap. An overlap is a bit error: the controller can no longer tell which state the cell holds.
The endurance ceiling depends on how many bits each cell stores. More bits per cell means more voltage states packed into the same physical window, narrower margins between them, and fewer cycles before those margins collapse. These are general NAND engineering figures.
| NAND Type | Bits per Cell | Voltage States | Typical P/E Endurance |
|---|---|---|---|
| SLC | 1 | 2 | 50,000 to 100,000 cycles |
| MLC | 2 | 4 | 3,000 to 10,000 cycles |
| TLC | 3 | 8 | 1,000 to 3,000 cycles |
| QLC | 4 | 16 | 100 to 1,000 cycles |
Wear leveling is supposed to make this degradation uniform: if every block ages at the same rate, no single block fails early. In practice the leveling is never perfect. Heavily-rewritten regions, the FTL metadata blocks, and blocks that took the brunt of static-leveling relocations wear ahead of the pack. That uneven exhaustion is what produces the first read failures, and it is why a drive can read clean across most of its capacity while a handful of worn blocks throw uncorrectable errors.
- Rising Bit-Error Rate
- As the oxide degrades, the raw bit error rate climbs. The controller's ECC, modern drives use LDPC codes, corrects the errors for a while. The escalation is gradual until the error rate approaches the correction ceiling, then a worn block flips from correctable to uncorrectable with little warning.
- Spare-Block Cascade
- Once exhaustion sets in, the bad block table grows fast. The controller retires failing blocks to the spare pool at an accelerating rate because newly-pressed blocks fail sooner than the originals did. When the reserve hits zero, remapping cascades: there is no fresh block to receive the next failure.
- Read-Only or FTL Drop
- With the spare pool empty, the controller locks to read-only to stop further damage, or, if the wear has reached the blocks holding the wear-leveling table and FTL metadata, it loses its map and reports a factory alias with the wrong capacity. The user data remains on the array either way.
Why TRIM and Garbage Collection Accelerate Loss After Wear Failure
A worn drive is at its most fragile in the window between failure and recovery, and the two background processes that keep a healthy SSD fast are exactly what destroy data on a failing one. TRIM and garbage collection do not know the drive is dying; they keep running until the power is cut.
- TRIM Unmaps, Then Returns Zeros
- TRIM is a logical deallocate command, not a physical erase. When a file is deleted, the operating system tells the controller which logical blocks are free; the controller unmaps them from the FTL and returns deterministic zeros (DZAT) when those addresses are read. The cells are not erased the instant TRIM runs, but the controller will no longer hand back that block's contents. The DZAT physics are the same on a worn drive as on a healthy one, but the stakes are higher because there is no second chance.
- Garbage Collection Erases the Cells
- After TRIM unmaps a block, garbage collection erases the physical cells asynchronously to reclaim them for the free pool. On a worn drive every erase also consumes a program/erase cycle the NAND can no longer spare, so garbage collection both erases recoverable content and pushes already-marginal blocks past their limit. Once a block is unmapped and erased, no lab can return it.
- Relocation Keeps Moving Bad Data
- Static wear leveling does not stop just because the spare pool is empty. The controller keeps trying to relocate cold data, but with no fresh blocks left, the migration target is itself a worn block. Data the user never touched can move into a block that fails ECC on the next read, which is how files written months ago start returning errors during the exact window you are trying to recover them.
The practical rule follows directly: every minute a failing drive stays powered, TRIM, garbage collection, and relocation chip away at recoverable data. The first command PC-3000 SSD issues after identifying the controller is the vendor-specific instruction that halts all three.
That is also why imaging a worn drive on consumer software is counterproductive: the software keeps the drive powered and reading while the background processes keep erasing.
SATA SSD recovery in this state falls in the $200–$1,500 range; NVMe spans $200–$2,500, with the firmware tier ($600–$900 for SATA, $900–$1,200 for NVMe) covering most worn-drive imaging jobs. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.
Why DIY Recovery Attempts Burn Through a Worn Drive's Remaining Life
Recovery software earns its place when an SSD is physically healthy and the problem is logical: an accidentally deleted file with TRIM disabled, a corrupted partition table, a formatted volume. Disk Drill, EaseUS, R-Studio, and PhotoRec all do this well, and on a healthy drive they are the right first call. A drive that has failed wear leveling is a different situation, and the same tools work against you there.
The damage is mechanical at the electron level. Every read applies a pass-through voltage to the unselected cells in the block, nudging their stored charge a little each time.
This read-disturb stress is harmless on fresh NAND with wide voltage margins. On worn NAND, where the margins have already collapsed, a full-surface scan can be the last push that flips a marginal block to uncorrectable.
Brute-force scanning compounds the problem in three ways at once. It reads every page, maximizing read-disturb exposure across the whole array.
It keeps the drive powered, which lets garbage collection and wear-leveling relocation keep erasing and rewriting failing blocks. And it pushes the controller through its full default retry sequence on every marginal page, sometimes triggering an auto-disconnect that forces the user to re-scan from the top, doubling the stress.
Professional recovery inverts every one of those behaviors. The lab halts background operations first, reads only the blocks needed for the target data, and skips known-bad zones.
It monitors per-block error rates between passes so a block that worsens gets imaged before it dies rather than re-read until it does. The difference is not better software running the same reads; it is a controlled read strategy that treats the drive as a finite, depleting resource.
How PC-3000 SSD and Data Extractor Image Worn NAND
PC-3000 SSD and the Data Extractor module communicate with the live controller through vendor-specific diagnostic channels instead of the standard read pipeline. That access is what lets a technician override the controller's conservative defaults and pull data from cells the controller already abandoned. PC-3000 SSD works through the live controller over SATA or NVMe; reading a desoldered die is a separate chip-off process.
- Halt background operations. The first command disables garbage collection, wear-leveling relocation, and TRIM so the controller stops erasing recoverable blocks during imaging.
- Read-retry table expansion. The controller's default retry table holds a fixed set of voltage offsets it applies when a page fails ECC. PC-3000 replaces it with an expanded set that sweeps more voltage levels across a wider range than the controller would ever attempt.
- Voltage-threshold shifting. Worn cells leak charge, so their read threshold drifts downward. PC-3000 shifts the reference voltage to follow that drift, resolving a cell at the voltage window where it now actually sits rather than where a fresh cell would.
- Soft-decision read activation. Where the controller supports it, PC-3000 forces the LDPC decoder into soft-decision mode, reading each cell at several voltage levels and feeding the probability of each bit state into the decoder. This recovers pages that single-threshold hard-decision reads cannot.
- Error-threshold imaging in passes. The drive is imaged block by block, ordered by error rate. Clean blocks image first for fast, high-yield progress. Marginal blocks get progressively more aggressive retry and voltage settings. Passes merge into one composite image so a block recovered late fills a gap left early.
- Wear-table and FTL reconstruction. When the wear-leveling table or FTL metadata blocks are among the casualties, PC-3000 scans surviving metadata markers in the raw NAND and rebuilds a virtual translator, restoring the logical-to-physical map needed to assemble files.
When read-retry and voltage tuning cannot open a window on the worn cells, controlled temperature changes can. Applying gentle heat to the NAND packages with an Atten 862 hot air station shifts the threshold distributions and can temporarily raise marginal cells into a readable range while PC-3000 watches the per-block error rate and images at the temperature that yields the lowest error count.
The board side is the other half of the capability. When the controller is not merely worn but dead, no enumeration, a shorted PMIC, a blown rail, no firmware access is possible until the board is alive. We localize the fault with FLIR thermal imaging, replace the failed component with a Hakko FM-2032 on its base station, and reflow the controller package on a Zhuo Mao BGA station when the controller itself needs rework.
On a drive with hardware encryption active, the encryption key is generated on and bound to the original controller and never leaves it, so reviving that exact controller is the only route to the data; a desoldered NAND die yields ciphertext. Board repair is not a service separate from data recovery here. For an encrypted SSD, it is data recovery.
Most worn-drive imaging falls in the firmware tier: $600–$900 for SATA, $900–$1,200 for NVMe. A dead board that needs component replacement before any read moves the job to the circuit board repair tier ($450–$600 for SATA, $600–$900 for NVMe). If the board is too damaged to repair and the NAND has to move to a donor PCB, the NAND swap tier applies ($1,200–$1,500 for SATA, $1,200–$2,500 for NVMe); 50% deposit required; donor drive cost additional. +$100 rush fee to move to the front of the queue.
SMART Attributes That Track Wear-Leveling State
SMART monitoring gives early warning that the spare pool is draining and the wear-leveling machinery is nearing its limit. Not every controller exposes the same attributes, and the raw values are vendor-specific, so read them as trend indicators rather than absolute counts.
| SMART ID | Attribute | Concern Threshold | What It Signals |
|---|---|---|---|
| 5 | Reallocated Sector Count | Any non-zero value | Blocks retired to the spare pool. A rising count means the wear-leveling reserve is being spent. |
| 170 | Available Reserved Space | Below 10% | Spare pool nearly empty. Read-only lockout is imminent once it reaches zero. |
| 173 | SSD Wear Leveling Count | Near rated ceiling | Average erase count across all blocks. Compare against the NAND's rated P/E endurance. |
| 187 | Uncorrectable Error Count | Any non-zero value | Pages that exceeded ECC capacity. Direct evidence of wear past the correction limit. |
| 202 | Percentage Lifetime Used | Near exhaustion | Highly vendor-specific. On some drives this counts up from 0 to 100 (percentage used); on others it counts down from 100 to 0 (percentage remaining), and a few OEMs repurpose ID 202 for Data Address Mark errors. Read it against your drive model: a value near its exhaustion threshold means the NAND is near the end of its rated cycle budget. |
| 241 | Total LBAs Written | Compare to TBW rating | Total host writes. Approaching or passing the rated TBW predicts wear-leveling exhaustion. |
SMART is a guide, not a guarantee. Some drives fail from firmware faults or power events with clean SMART readings; others run well past their rated TBW. The lab uses SMART to estimate how much read-retry tuning a worn drive will need before imaging begins.
When Software Works and When It Cannot
The line between a software job and a lab job is the physical health of the drive. A worn SSD usually sits on the wrong side of that line, but it is worth naming exactly where the boundary falls so you do not pay for lab work you do not need, or lose data trying software that cannot help.
| Drive State | Software Can Help? | Why |
|---|---|---|
| Physically healthy, accidental deletion, TRIM disabled | Yes | Cells are intact and the controller answers normally; the issue is logical. |
| Corrupted partition table, formatted volume | Yes | The NAND is fine; recovery software reads through the healthy controller. |
| Read-only lockout from spare-block exhaustion | No | The controller refuses writes and serves marginal reads; only diagnostic-mode tuning images it safely. |
| FTL or wear-table corruption, wrong capacity reported | No | The map is gone; the FTL has to be rebuilt from raw NAND metadata. |
| TRIM already executed on the deleted blocks | No | The controller unmapped the blocks and returns zeros; garbage collection erases the cells. No tool reverses it. |
| Dead controller, drive not detected in BIOS | No | Software cannot talk to a board that will not power on; board repair comes first. |
If a worn drive still mounts and you have not yet tried anything, the safest move is to stop and image it, not to keep reading it. If it has already dropped to read-only or vanished from the operating system, software has nothing left to offer and the drive belongs in the lab. Our SSD data recovery workflow starts with a free evaluation and a firm quote before any work begins.
How Wear-Table Failure Cascades Into FTL Collapse
Two seemingly identical worn drives with the same SMART numbers can have very different recoverability, and the reason is how far the cascade has progressed. Wear-leveling exhaustion is a sequence, not a single event, and where the drive sits in that sequence decides how much work recovery takes.
- Stage 1: Dynamic Leveling Saturation
- Available Reserved Space drops below roughly 10%. The controller has few fresh blocks left to steer writes toward, so new writes land on blocks already near their cycle ceiling. Recently-written data returns higher error rates than older static data, the inverse of a healthy drive. Caught here, the drive images cleanly with adjusted read parameters.
- Stage 2: Static Leveling Stalls
- Static wear leveling needs somewhere to migrate cold data. With the spare pool depleted, the migration target is itself a worn block, so relocated data fails ECC on the next read. Uncorrectable Error Count climbs on data the user never touched, because the controller relocated it during idle time into a block that was already spent.
- Stage 3: Wear-Table and FTL Block Failure
- The wear-leveling table and FTL metadata are rewritten far more often than user data, so their blocks wear out first. Once they degrade past the LDPC ceiling, the controller cannot load its own translator at power-up. The drive enters a factory alias state and reports the wrong capacity. The user data is intact; the map needed to locate it is unreadable.
- Stage 4: Background Operations Refuse to Stop
- Even with a corrupt FTL, the controller may keep running garbage collection and TRIM enforcement on whatever map fragments survive. Every powered minute consumes more recoverable blocks, which is why the first PC-3000 SSD command halts background operations before any imaging starts.
A drive caught at Stage 1 is a read-parameter problem. A drive caught at Stage 3 needs a firmware loader uploaded into controller RAM, diagnostic-mode entry, and a virtual translator rebuilt from raw NAND metadata. Recovery work at Stage 3 falls in the firmware tier: $600–$900 for SATA, $900–$1,200 for NVMe. If wear forces escalation to a NAND chip-off on an unencrypted drive, the job moves to $1,200–$1,500 for SATA and $1,200–$2,500 for NVMe. A donor drive is a matching SSD used for its circuit board. Typical donor cost: $40–$100 for common models, $150–$300 for discontinued or rare controllers.
The same wear-out physics drives our NAND degradation recovery work, and the same controller diagnostics decide both. Wear leveling is simply the firmware layer that decides which cells reach exhaustion first.
Frequently Asked Questions
What is wear leveling on an SSD?
Can data be recovered after wear leveling fails?
Why does my SSD say it is read-only after heavy use?
Does running recovery software make a worn SSD worse?
What is the difference between static and dynamic wear leveling?
How does TRIM interact with a worn SSD during recovery?
How does the lab image worn NAND that the controller has given up on?
What does wear-leveling recovery cost?
Can a worn SSD be repaired and reused after recovery?
Related services
Related Recovery Services
SSD locked to read-only or showing wear warnings?
Free evaluation. PC-3000 SSD read-retry and voltage tuning for worn NAND. SATA SSD from From $200, NVMe from From $200. No data, no fee.
