A workplace with a green plant

The nvme-cli tools and the command smartctl from smartmontools can check the healthiness of NVMe sticks.

  1. nvme-cli
nvme smart-log /dev/nvme0

Look for the field media_errors.

  1. smartmontools

Example:

smartctl -x -q noserial /dev/nvme0

Below are extracted from the smartmontools website:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,769,281 [905 GB]
Data Units Written: 1,384,224 [708 GB]
Host Read Commands: 24,646,213
Host Write Commands: 19,105,374
Controller Busy Time: 38
Power Cycles: 32
Power On Hours: 129
Unsafe Shutdowns: 6
Media and Data Integrity Errors: 0
Error Information Log Entries: 44

Look for Media and Data Integrity Errors. I had a NVMe stick that this value was kept on rising. That NVMe stick had malfunctioned.

Some NVMe sticks are normal but it would keep on writing entries to Error Information Log, so I mostly ignored that field for those NVMe sticks.

So both methods would query the SMART data to check the healthiness of the NVMe sticks. nvme smart-log is better because it could also display the available spare of the device.


Further reading (external sites):

Photo credit: Nielsen Ramon (CC0 license) in StockSnap


Share this post on: TwitterRedditEmailHackerNewsLinkedInFacebookIndienews