[kwlug-disc] NVME failure?

Ron Singh ronsingh149 at gmail.com
Sat Jul 31 17:37:02 EDT 2021


We have been using the Sammy 970 Pro 1TB for caching in servers being
beaten about 18hr/6dy and have never had any failures.
I think we started on this model some 2yrs ago when they first came out,
prior to that we used the 960 Pro units.
All are NVMe.

We *always* marry them to a decent heatsink. Heat kills, NVMe drives run
hot when being beaten, heatsinks plus a chassis fan providing decent
laminar airflow of the SSDs makes for greater reliability I think.

Specs on SSD:
https://www.samsung.com/ca/memory-storage/nvme-ssd/970-pro-nvme-m-2-ssd-1tb-mz-v7p1t0bw/
Not sure how the newer 980 Pro stack up as we are not transitioning away
from the 970 Pro unit mid-2022.

Specs on Heatsink:
https://www.sabrent.com/product/SB-HTSK/m-2-2280-ssd-rocket-heatsink/

This really does not address your Q, but I thought I would toss out an
anecdote out there.
BTW, the nitty-gritty on the Micron 9300 Max can be found here:
https://media-www.micron.com/-/media/client/global/documents/products/product-flyer/9300_ssd_product_brief.pdf
Micron plays fast and loose about their TBW rating, sneaky buggers,
"3-drive writes per day".

Thanks,

Ron S.



On Sat, Jul 31, 2021 at 4:27 PM Khalid Baheyeldin <kb at 2bits.com> wrote:

> Short version: what is the longevity of NVME disks under heavy writes,
> e.g. for MySQL database?
>
> I am hoping that some hardware knowledgeable folk would clue me in on
> this hardware related issue.
>
> Basically, I had a client who got a new server over a year ago. The
> hosting company, a large US based host, recommended that we use an
> NVME disk for MySQL. This is a plain old physical server running
> Ubuntu Server 20.04 LTS (no VMs, no docker). It has 64GB RAM and 16
> cores. The root file system is two SSDs that are software RAIDed
> together. 1
>
> We had kernel patches accumulating over that year, and I wanted to do
> a reboot to make sure that everything started normally. Upon rebooting
>
> [  128.001364] nvme nvme0: Device not ready; aborting initialisation
> [  128.002041] nvme nvme0: Removing after probe failure status: -19
>
> That NVME was a 1.6TB Micron 9200 MAX, if that matters.
>
> There was no device file under /dev/ for that disk anymore.
>
> After the host replacing the NVME, everything was normal, as below:
> [    7.558183] nvme nvme0: Shutdown timeout set to 10 seconds
> [    7.562576] nvme nvme0: 32/0/0 default/read/poll queues
> [    7.565741]  nvme0n1: p1
> ...
> Jul 31 15:27:54 live multipath: nvme0n1: failed to get udev uid:
> Invalid argument
> Jul 31 15:27:54 live multipath: nvme0n1: uid =
> eui.000000000000000100a0750128df8715 (sysfs)
> ...
> [    6.008941] nvme nvme0: pci function 0000:b6:00.0
>
> [   11.571864] EXT4-fs (nvme0n1p1): mounted filesystem with ordered
> data mode. Opts: (null)
>
> Now to the questions:
>
> - Why would a device be functional before a reboot but totally go away
> after, and not being even detected by the operating system?
> - Are NVME as unreliable as SSDs or better? Or are they just faster?
>
> All thoughts appreciated ...
> --
> Khalid M. Baheyeldin
> 2bits.com, Inc.
> Fast Reliable Drupal
> Drupal performance optimization, hosting and consulting.
> "Sooner or later, this combustible mixture of ignorance and power is
> going to blow up in our faces." -- Dr. Carl Sagan
>
> _______________________________________________
> kwlug-disc mailing list
> kwlug-disc at kwlug.org
> https://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20210731/337fb16f/attachment.htm>


More information about the kwlug-disc mailing list