[kwlug-disc] NVME failure?

Chris Irwin chris at chrisirwin.ca
Sat Jul 31 17:20:22 EDT 2021


On Sat, Jul 31, 2021, at 16:26, Khalid Baheyeldin wrote:
> [  128.001364] nvme nvme0: Device not ready; aborting initialisation
> [  128.002041] nvme nvme0: Removing after probe failure status: -19
> 
> That NVME was a 1.6TB Micron 9200 MAX, if that matters.
> 
> There was no device file under /dev/ for that disk anymore.

The only nvme failure I've had so far failed into r/o mode, so I could actually pull data off of it, but it would fail to write. That said, it was a consumer m.2 nvme drive, rather than an enterprise u.2 drive. If you don't even get a device file to inspect, you're likely out of luck diagnosing the issue.

> - Why would a device be functional before a reboot but totally go away
> after, and not being even detected by the operating system?

I know Intel Enterprise SSDs, for example, had a target for media wearout, and once that was reached, the drives would brick themselves entirely, instead of potentially introducing random errors over time. I believe the goal was to allow you to replace the drive immediately (and recover via array or backups), instead of potentially introducing smaller issues later which may propagate into backups.

Admins should be replacing those drives preemptively -- either by monitoring the stats using smartctl/nvme tools, or possibly the iLO/similar system will flag it as well.

Reference: https://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte/

It may be that Micron does something similar, but one of the metrics only gets checked at initialization time?

> - Are NVME as unreliable as SSDs or better? Or are they just faster?

I haven't heard any particular concerns about NVMe or SSD reliability, unless your device is full and/or you're not doing using trim/discard (both of which would limit the effectiveness of wear levelling).

-- 
*Chris Irwin*

email:   chris at chrisirwin.ca
 xmpp:   chris at chrisirwin.ca
  web: https://chrisirwin.ca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20210731/f1d27b46/attachment.htm>


More information about the kwlug-disc mailing list