Technical Reference
HDD PRML Read Channel Tuning

PRML (Partial Response Maximum Likelihood) is the digital signal processing pipeline a hard drive uses to turn the analog signal from the read head into a bit stream. EPRML extends the partial-response target to handle higher areal density. Both rely on a digital FIR equalizer and a Viterbi detector whose coefficients and thresholds are calibrated per-head at the factory and stored in the Service Area. When heads degrade or a donor head stack is installed during hard drive data recovery, those stored coefficients no longer match the actual read signal. PC-3000 Portable III and PC-3000 Express expose the relevant adaptive modules so the technician can transfer or recompute them, allowing the channel to converge on the new heads before imaging begins.
What does PRML actually do inside the read channel?
A modern hard drive does not read individual peaks. The slider flies a few nanometers above the platter; the magnetoresistive read element produces a continuous voltage waveform whose features overlap because adjacent bits are written close enough together that their flux fields blur into each other. This blurring is called inter-symbol interference and it is the entire reason PRML exists.
The read channel performs the following sequence on every track read:
- The preamp on the head stack assembly amplifies the head voltage and drives it across the flex cable to the controller IC on the PCB.
- A continuous-time analog filter shapes the waveform to roughly the partial-response target shape and an analog-to-digital converter samples it at the channel rate.
- A digital FIR (Finite Impulse Response) equalizer multiplies a sliding window of recent samples by a vector of tap coefficients to produce equalized samples that match a known partial-response target (PR4, EPR4, or EEPR4).
- A Viterbi detector consumes the equalized samples and walks a trellis whose edges represent possible bit sequences. For each edge it computes the squared error between the actual equalized sample and the sample value the channel model predicts for that edge.
- The detector keeps only the survivor path entering each state; after a fixed traceback depth it commits the bit on the lowest-error path. The output is the recovered bit stream that the on-drive ECC then validates and corrects.
The FIR coefficients and the Viterbi branch metrics are not generic. They are tuned per head, per zone, and per drive at the factory because the read signal depends on the exact head, the exact platter substrate, the exact fly height, and the exact preamp gain trim. Those calibrated values live in the Service Area firmware, not in volatile memory.
Where are the read channel coefficients stored?
Every drive vendor stores read channel calibration data in a different place. The adaptive parameter set typically includes FIR tap weights, Viterbi target gain, AGC loop gains, microjog offsets that center the head over the track, and the bias settings for the preamp. PC-3000 vendor modules give the technician structured access to these modules; raw ATA does not.
| Family | Adaptive Module | Stored Parameters |
|---|---|---|
| Seagate F3 (Rosewood, Grenada, Makara) | RAP / SAP / CAP system files | Per-head FIR coefficients, Viterbi target shape, zone-by-zone gain trims, microjog offsets, preamp bias values |
| WD Marvell (Palmer, SpyGlass, and later) | Module 47 (adaptive) | Voice coil current, head flight height, FIR equalizer coefficients, AGC loop gains, per-head microjog offsets |
| Toshiba MG / MN | PCB ROM (NVRAM) plus SA adaptive zone tables | PCB-bound head adaptive parameters, per-head equalizer taps, channel target gain, preamp bias, microjog offsets |
| Hitachi / HGST legacy | Microcode adaptive overlays | Channel target coefficients, AGC parameters, head-specific bias and offset values |
When PC-3000 reads these modules, it is reaching them through Vendor Specific Commands that the controller IC accepts only after the family-specific unlock sequence; that is why PC-3000 can access firmware that consumer software cannot.
What happens to the Viterbi detector when SNR collapses?
The Viterbi detector is a maximum-likelihood detector. Its branch metric is the squared error between the actual equalized sample and the predicted sample for that edge. When signal-to-noise ratio is healthy, that error is small on the correct edge and large on incorrect edges, so the survivor path almost always picks the right transition. When heads degrade, three things change at once.
First, the read signal amplitude drops. The AGC loop tries to compensate, but past a certain point the equalized samples lose resolution at the bottom of the dynamic range and quantization noise dominates. Second, the timing recovery loop receives weaker zero crossings; the sample clock jitters relative to the true bit boundaries; samples land off the partial-response target on every edge. Third, the head transducer noise rises relative to the signal; the channel becomes colored noise rather than white, which the stock channel model does not assume.
The combined effect is that the squared errors on the correct edge and the squared errors on incorrect edges become comparable. The survivor selection at each state starts picking wrong branches whenever a noise spike happens to favor an incorrect edge. The bit error rate at the detector output climbs from below 1e-3 (where on-drive ECC can clean it up) toward 1e-2 or worse (where ECC fails). Once ECC fails, the drive returns a hard read error to the host and the sector is marked unreadable.
PC-3000 lets the technician adjust the detection threshold and target gain through the family utility, allowing the Viterbi to operate against a relaxed model that matches the degraded signal. This does not recover data the head cannot physically read; it recovers data the channel was throwing away because its stored model assumed a healthier signal than the actual head can produce. In our lab, this adjustment is paired with Data Extractor imaging, and where bus-level instability is also present, with DeepSpar Disk Imager handling the per-sector timeouts and head-selective passes.
How do ATA firmware retry loops over-adapt the FIR equalizer on a marginal head?
Short answer: the LMS adaptation loop inside the FIR equalizer stays active during every internal retry. If the head is reading mostly noise, the loop drifts the tap coefficients toward fitting that noise. The next read of a healthy sector then uses corrupted taps, the slicer eye opening collapses, and the drive throws hard read errors on sectors that were readable a moment earlier.
The FIR equalizer described above does not carry a static set of taps. It runs a Least Mean Squares adaptation loop in hardware, continuously adjusting tap weights to minimize the squared error between the equalized samples and the partial-response target. Under normal conditions this loop is what lets the drive ride out slow real-world drift: thermal expansion of the platter, zone-boundary changes in linear density, small fly-height variations as the slider crosses the disk.
When the host requests data from a marginal sector, the drive's ATA firmware does not give up on the first failure. It enters an internal error-recovery procedure that retries the sector under varying conditions: micro-jog offsets to the actuator, timing shifts on the channel sample clock, changes to preamp bias, alternative head-selection patterns where the geometry allows. Those retries can run for hundreds of milliseconds per sector and on some families exceed thirty seconds before the drive returns a failure to the host.
Through that entire window the LMS loop keeps adapting. Because the head is producing mostly broadband noise rather than coherent magnetic transitions, the LMS has no real signal to lock onto. The mathematics of the loop are indifferent to the source of the error metric, so it minimizes squared error against the noise the head is feeding it. The tap coefficients drift to fit the noise floor. By the time the retry loop finally gives up, the FIR is no longer equalizing the partial-response target. It is equalizing the failure.
The next read on a healthy track uses those drifted taps. The equalizer output no longer matches PR4 or EPR4; the constellation of expected sample values shifts; the slicer eye opening closes. The Viterbi detector starts producing errors on a sector that was clean minutes earlier. Each new error triggers another retry loop, which keeps the LMS adapting against the wrong signal, which heats the preamplifier and increases head wear. The drive eventually hangs in a BSY (busy) state or drops off the SATA bus entirely. This is the mechanism behind the common pattern where consumer recovery software appears to make a drive worse the longer it runs: the drive is not getting physically worse on its own, the unchecked retry loops are corrupting the read channel calibration.
Why does Viterbi survivor-path collapse produce correlated bit errors that defeat ECC?
Short answer: Viterbi errors are not random single-bit flips. When the detector picks the wrong survivor path, the decoded sequence diverges from the correct sequence for several consecutive bits before merging back. The output is a contiguous burst of wrong bits inside one sector. Reed-Solomon ECC on the drive has a fixed per-codeword correction budget, and a long enough Viterbi burst exhausts that budget in a single codeword, leaving the sector uncorrectable.
A Reed-Solomon code over a Galois field treats data as fixed-width symbols. An (n, k) RS code can correct up to t erroneous symbols inside a codeword where n - k >= 2t. The code is very efficient against burst errors as long as the burst stays inside the symbol budget: an error that corrupts one bit of a symbol and an error that corrupts every bit of that same symbol both cost exactly one symbol against the budget. The code tolerates short bursts cheaply.
The Viterbi detector is what determines how long the bursts are. When SNR is healthy, the squared error on the correct trellis edge is small and the errors on incorrect edges are large; the survivor path tracks the true bit sequence and any error event is short. When SNR collapses, branch metrics on the correct edge become comparable to branch metrics on incorrect edges, so a noise spike at a single trellis node is enough to pick the wrong survivor. The detector then walks down a divergent path for several clock cycles before the trellis state forces it to remerge with the correct path. Every bit decoded during that divergence is wrong, and the wrong bits land contiguously inside the codeword the ECC is about to correct.
A divergence that crosses enough symbol boundaries pushes the codeword past t symbol errors. The RS decoder cannot identify a valid corrected codeword inside the budget, fails the correction, and the drive returns an Unrecoverable Read Error to the host even though the head physically retrieved a usable analog signal. This is why relaxing the Viterbi target gain or branch-metric thresholds through PC-3000 can recover data the stock drive was discarding: the same analog signal is reinterpreted against a channel model that matches the degraded head, the survivor selection stops collapsing as often, burst lengths shorten, and the resulting codewords come back inside the RS budget. No bits are invented. The channel just stops throwing away bits the head was already reading.
How does the DeepSpar Disk Imager Multi-Pass workflow keep a failing head viable for imaging?
Short answer: the imager bypasses the drive's native retry firmware and the host OS's timeout behavior so the FIR equalizer cannot enter the over-adaptation loop. It then images in passes that prioritize healthy heads first, enforce per-sector timeouts in the millisecond range, and revisit skipped sectors with reversed direction and head-selective scheduling so a marginal head rests between attempts.
The DeepSpar Disk Imager is a dedicated PCIe controller that asserts low-level control over the SATA PHY. When a read times out, the imager issues a hardware COMRESET or a precise power-cycle on its own clock, instead of waiting for the host operating system or BIOS to notice the bus is stuck. Before imaging begins, it sends Vendor Specific Commands that disable the drive's native long retry routines, automatic sector reallocation, SMART updates, and read look-ahead. With those disabled, the FIR equalizer's LMS loop cannot run the extended adaptation that corrupts the channel on healthy sectors.
The Multi-Pass workflow stages the actual extraction so the drive is asked to do the easiest work first and the hardest work last. The configuration that the lab applies varies with the failure mode, but the structure is consistent:
- Pass 1, healthy-head forward sweep. The imager builds a head map from PC-3000 diagnostics and routes the first sweep to sectors served by the heads with the highest read amplitude. Per-sector read timeout is tightened to the low hundreds of milliseconds. On timeout the imager logs the LBA to a skip bitmap and jumps forward by a configurable block, moving the actuator away from the trouble zone immediately rather than dwelling on it.
- Pass 2, reverse and offset sweep. Sectors logged in pass one are revisited reading from the opposite direction (outer to inner or inner to outer depending on the drive geometry). Approaching a damaged track from the other side sometimes alters the slider's fly dynamics enough to clear a marginal sector. Read offset and microjog are adjusted within the limits the head map declares safe.
- Pass 3, head-selective slow sweep on the weak head. The remaining skipped LBAs are attacked on the marginal head alone, with longer per-sector timeouts and head-selection holds that prevent the actuator from servicing other heads in between. The marginal head is given the cleanest possible conditions for the read, but only after the rest of the drive has already been imaged on healthier heads.
- Pass 4, residual recovery. Sectors that still refuse to read are attacked with relaxed channel models inside PC-3000 Data Extractor. Viterbi branch metrics and FIR target gain are loosened, sometimes ECC is bypassed entirely so the raw decoded bits can be examined and partial sector content harvested where the codeword as a whole is unrecoverable.
The point of the staging is not throughput. It is preservation. A marginal head only has so many reads left before its preamp drifts further or its read element fails entirely. The Multi-Pass schedule spends that budget on the LBAs that matter and avoids burning it on retry loops over the worst sectors first.
When does signal-processing intervention stop working and a head swap become unavoidable?
Short answer: as long as a head still produces a read signal that responds to channel adjustments, imaging stays on the read-channel side. Once a head returns zero amplitude across its sectors or its preamp is electrically shorted, the decision is mechanical: stop imaging, open the drive in a 0.02 micron ULPA-filtered clean bench, transplant a matched donor head stack, and only then resume read-channel work.
Read channel tuning solves a specific class of problem: the analog signal off the head is recoverable but the stock channel model is throwing it away. PRML threshold adjustment, FIR target gain change, Viterbi branch-metric relaxation, and DeepSpar Multi-Pass imaging all live inside that class. None of them recover data from a region of platter where the magnetic layer has been physically removed or from a head that has lost contact with its preamp.
PC-3000 diagnostics expose the boundary directly. The head map test reads short calibration patterns through each head and reports per-head amplitude and bit error rate at the detector output. If a head shows elevated bit error rate but still reads its Service Area calibration zone after Viterbi threshold and gain adjustment, the case stays on the read-channel side. If a head returns zero amplitude on every sector mapped to it, or if preamp bias adjustments produce no change in read signal level, the head is no longer reading the platter at all. FLIR thermal cameras give an independent confirmation when the preamp IC on the head stack assembly heats abnormally during spin-up, which usually indicates an electrical short from a damaged slider rather than a tunable signal-quality issue.
When the boundary is crossed, the drive moves to the 0.02 micron ULPA-filtered clean bench described in cleanrooms versus laminar-flow benches for a head stack transplant. Detail on the procedure itself, including how the heads physically read and what a head crash actually destroys, lives in the linked references. After the donor heads are installed, the workflow returns to this page: adaptive module transfer, in-channel adaptation against the donor signal, verification, and only then full imaging through DeepSpar. All of that work is performed in-house at the Austin, TX lab.
How does FIR equalizer recalibration work after a head swap?
A head swap replaces the patient drive's head stack assembly with one harvested from a matched donor. The donor heads have their own air bearing surface profile, their own preamp gain trim from the donor PCB, and a slightly different fly height over the patient's platters because the suspension is not the original. The patient's stored FIR coefficients were tuned to the original heads, not to these donor heads.
The recalibration procedure has three phases.
- Adaptive transfer. PC-3000 reads the patient adaptive module set from the Service Area copy held on the platters or, if the SA is unreadable, from a backup taken before the swap. The donor adaptive module set is read from the donor drive while it is still functional, and the relevant per-head coefficients are merged into the patient's working module image. For Seagate F3 this involves writing a merged RAP back to the SA; for WD Marvell it involves writing a merged Module 47.
- In-channel adaptation. The drive is booted in a diagnostic mode that allows the FIR taps to adapt while reading a known calibration zone in the Service Area. The LMS (least mean squares) adaptation loop drives the taps toward values that minimize the squared error between the equalizer output and the partial-response target. After convergence, the adapted coefficients are committed to the active module set. This is the step that brings the channel back into spec on the new heads.
- Verification on user data. The drive reads a representative range of user LBAs through Data Extractor with diagnostics enabled. If the per-head bit error rate at the detector output stays inside the acceptable window, the swap is considered converged and full imaging proceeds. If a particular head still fails to converge, the head map is edited to disable that head and imaging proceeds with the remaining heads.
When the patient heads are still partially functional, transferring the patient's stored coefficients to the donor heads is the wrong move because it pins the channel to a degraded calibration. In that case the in-channel adaptation phase is run with no stored seed and the FIR taps are allowed to converge from a neutral starting state on the donor signal alone.
When is full retuning required versus a simple parameter transfer?
| Scenario | Transfer Sufficient | Recalibration Required |
|---|---|---|
| Same firmware revision donor, same head ID | Yes | No |
| Cross firmware revision donor, same family | Partial (transfer plus verification) | Often |
| Different head generation on Seagate F3 | No | Yes |
| WD Marvell Palmer or SpyGlass with different ABS donor | No | Yes |
| Toshiba MG cross-capacity donor | No | Yes |
| Patient heads partially functional, donor healthier | No | Yes (no patient seed) |
Sourcing a donor that allows a clean transfer rather than a full recalibration is the point of careful donor matching: same family, same firmware revision, same head ID, same generation. The closer the donor, the smaller the convergence work after the swap.
How does the Service Area gate read channel access?
The Service Area is the firmware region on the platters. It holds the translator, the P-List and G-List, the head map, the SMART overlay, and the adaptive modules described above. The drive must boot through the Service Area before it can serve user data, which means it must read the SA tracks before the host LBA range is even visible. If the heads can read user data tracks but cannot read the SA tracks (a common pattern when one head out of a multi-head stack has degraded and that head happens to cover the SA), the drive never finishes its boot sequence and the host sees no drive at all.
PC-3000 handles this with a hot-swap or a diagnostic-mode boot. In a hot-swap, a donor drive of the correct family is allowed to boot fully, loading firmware into RAM; a terminal command parks the heads via SLEEP; the donor PCB is then transferred to the patient HDA. The donor PCB now has firmware in RAM and reads the patient platters with the patient's heads. In diagnostic mode, the drive boots far enough to accept VSC commands without finishing the SA load; the technician then issues commands directly to read or rewrite SA modules including the adaptive read channel set.
Once the SA modules are accessible, the read channel work described in the previous sections proceeds. Without SA access the read channel cannot be retuned because the stored coefficient modules are unreachable; the drive cannot be addressed for diagnostic imaging through any normal interface. Firmware repair and read channel tuning are sequential, not parallel.
Read channel tuning addresses signal quality, not physical loss.
If the magnetic layer is damaged or contaminated, no equalizer coefficient set recovers the bits that were physically destroyed. Surface damage cases require a head swap and platter cleaning in a 0.02 micron ULPA-filtered clean bench, performed before any read channel work. Pricing for surface-damage cases is the $2,000 tier. Pricing for head-swap cases requiring channel retuning is the $1,200–$1,500 tier.
Frequently Asked Questions
What is PRML in a hard drive?
PRML (Partial Response Maximum Likelihood) is the digital signal processing architecture that turns the analog read head signal into bits. The drive samples the waveform, runs the samples through a digital FIR equalizer that shapes them to a partial-response target, and then walks a Viterbi trellis to pick the most likely bit sequence given the channel noise model. EPRML extends the target to longer responses that handle higher areal density.
Why does PRML matter for HDD data recovery?
The FIR coefficients and Viterbi parameters are calibrated per-head at the factory and stored in the Service Area. When a head degrades or a donor head stack is installed, the stored coefficients no longer match the actual read signal. Bit error rate climbs until ECC fails and the drive cannot read its own translator. PC-3000 gives the technician access to the relevant adaptive modules so the channel can be transferred or recalibrated.
Does PC-3000 retune the read channel automatically?
Partly. PC-3000 transfers the adaptive module set during a head swap (Module 47 on WD Marvell, RAP/SAP/CAP on Seagate F3, adaptive zone tables on Toshiba MG). When the donor head generation differs from the patient, the technician runs in-channel adaptation against a calibration zone in the Service Area so the FIR taps converge on the new signal, then commits the adapted coefficients.
When is read channel retuning required after a head swap?
Retuning is required when the donor heads have a different ABS generation, a different preamp gain trim, or a sufficiently different fly height that the equalizer cannot converge on the stored coefficients. Cross-firmware-revision Seagate F3 donors, different head ID WD Marvell Palmer or SpyGlass donors, and most Toshiba MG cross-capacity donors all need active recalibration rather than a clean transfer.
What does the Viterbi detector do?
The Viterbi detector is a maximum-likelihood detector that finds the most probable bit sequence given a noisy sample stream and a model of the channel. It computes squared error between actual and predicted samples on every trellis edge, keeps the survivor path into each state, and after a fixed traceback depth commits the bit on the lowest-error path. Degraded heads break the channel model assumptions and the survivor selection picks wrong branches more often, which is what raises bit error rate.
Can read channel retuning recover a drive with platter damage?
No. Retuning addresses signal quality issues from head wear, donor mismatch, or preamp drift. It does not repair physical damage to the magnetic layer. Surface damage cases require a head swap and platter cleaning in a 0.02 micron ULPA-filtered clean bench, and pricing falls under the $2,000 tier.
What is the difference between PRML and EPRML?
Both are partial-response targets used inside the same maximum-likelihood detector. PRML originally targeted PR4. EPRML extends to EPR4 and EEPR4, longer targets that better match the recording channel at higher density. Modern drives use noise-predictive variants on top. From a recovery standpoint, the PC-3000 procedure for accessing adaptive modules and recalibrating the channel is the same regardless of which target the drive uses.
How do ATA retry loops over-adapt the FIR equalizer on a marginal head?
The LMS adaptation loop inside the FIR equalizer keeps running during every internal retry. If the head is reading mostly noise, the loop drifts the tap coefficients toward fitting that noise. The next read of a healthy sector then uses corrupted taps, the slicer eye opening collapses, and the drive throws hard read errors on sectors that were readable a moment earlier. This is why consumer software that lets the native retry loops run unchecked can push a salvageable drive into a busy hang or a SATA bus drop.
Why do Viterbi survivor-path errors defeat the drive's Reed-Solomon ECC?
The Viterbi detector commits to bits along a trellis. When SNR collapses, a single wrong survivor selection causes the decoded sequence to diverge for several consecutive bits before merging back, producing a contiguous burst of wrong bits inside one codeword. Reed-Solomon has a fixed per-codeword symbol-error budget. Random single-bit flips fit easily inside that budget; long Viterbi bursts exhaust it and the codeword fails correction. Adjusting Viterbi thresholds and FIR target gain through PC-3000 can shorten the bursts back inside the budget.
What does the DeepSpar Disk Imager Multi-Pass workflow actually do?
The imager bypasses the drive's native retry firmware and the host OS timeout behavior so the FIR equalizer cannot over-adapt. It images in passes: a tight healthy-head forward sweep first, then a reverse and offset sweep on skipped LBAs, then a head-selective slow sweep on the weak head, then residual recovery with relaxed channel models inside PC-3000 Data Extractor. Per-sector timeouts and hardware-level SATA resets keep a marginal head resting between attempts instead of burning its remaining read budget on retry loops over the worst sectors.
When does signal-processing intervention stop working?
As long as a head still produces a read signal that responds to channel adjustments, the case stays on the read-channel side and DeepSpar handles imaging. Once a head returns zero amplitude across all sectors mapped to it, preamp bias adjustments produce no signal change, or FLIR thermal imaging shows abnormal heat from the preamp IC indicating an electrical short, the boundary has been crossed. Imaging stops, the drive is opened in a 0.02 micron ULPA-filtered clean bench, and a matched donor head stack is transplanted before any further read-channel work resumes.
If you are experiencing this issue, learn about our hard drive data recovery service.