Salut à tous !
Le démon smartd me signale depuis quelques jours des erreurs sur un de mes disques :
[code]From root@dotslashplay.it Wed Dec 17 00:25:29 2014
Return-Path: root@dotslashplay.it
X-Original-To: root
Delivered-To: root@dotslashplay.it
Received: by dotslashplay.it (Postfix, from userid 0)
id 23C641C09F2; Wed, 17 Dec 2014 00:25:28 +0100 (CET)
To: root@dotslashplay.it
Subject: SMART error (CurrentPendingSector) detected on host: HAL9000
Message-Id: 20141216232529.23C641C09F2@dotslashplay.it
Date: Wed, 17 Dec 2014 00:25:28 +0100 (CET)
From: root@dotslashplay.it (root)
This message was generated by the smartd daemon running on:
host name: HAL9000
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sdb [SAT], 32 Currently unreadable (pending) sectors
Device info:
ST3000DM001-1CH166, S/N:Z1F24DDA, WWN:5-000c50-04f7c05e5, FW:CC24, 3.00 TB
For details see host’s SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Dec 5 22:00:21 2014 CET
Another message will be sent in 24 hours if the problem persists.
From root@dotslashplay.it Wed Dec 17 00:25:30 2014
Return-Path: root@dotslashplay.it
X-Original-To: root
Delivered-To: root@dotslashplay.it
Received: by dotslashplay.it (Postfix, from userid 0)
id 58E8F1C09B3; Wed, 17 Dec 2014 00:25:29 +0100 (CET)
To: root@dotslashplay.it
Subject: SMART error (OfflineUncorrectableSector) detected on host: HAL9000
Message-Id: 20141216232529.58E8F1C09B3@dotslashplay.it
Date: Wed, 17 Dec 2014 00:25:29 +0100 (CET)
From: root@dotslashplay.it (root)
This message was generated by the smartd daemon running on:
host name: HAL9000
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sdb [SAT], 32 Offline uncorrectable sectors
Device info:
ST3000DM001-1CH166, S/N:Z1F24DDA, WWN:5-000c50-04f7c05e5, FW:CC24, 3.00 TB
For details see host’s SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Dec 5 22:00:21 2014 CET
Another message will be sent in 24 hours if the problem persists.[/code]
Erreurs qui sont bien sûr confirmées par smartctl (ID# 197 & 198) :
[code]root@HAL9000:~# smartctl -A /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright © 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 104 099 006 Pre-fail Always - 115985942
3 Spin_Up_Time 0x0003 095 094 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 224
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 64529137937
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 13128
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 222
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 006 006 000 Old_age Always - 94
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 093 093 000 Old_age Always - 7
190 Airflow_Temperature_Cel 0x0022 065 048 045 Old_age Always - 35 (Min/Max 31/39)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0032 063 063 000 Old_age Always - 74130
194 Temperature_Celsius 0x0022 035 052 000 Old_age Always - 35 (0 10 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 32
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 32
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 9368h+52m+22.997s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 27501598581
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 53398844399[/code]
Un badblocks en lecture (via e2fsck -c /dev/sdb1) n’ayant rien corrigé, je me suis mis en tête de corriger ces secteurs manuellement. J’ai donc commencé par une série de tests SMART, qui m’ont permis de trouver le premier bloc illisible :
[code]root@HAL9000:~# smartctl -l selftest /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright © 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 Extended captive Completed: read failure 90% 13120 3006367728
2 Short captive Completed: read failure 90% 13120 3006367728
3 Short offline Completed without error 00% 12191 -
[/code]
dd me permet de confirmer que ce bloc ne peut pas être lu :
root@HAL9000:~# dd if=/dev/sdb of=/dev/null ibs=512 count=1 skip=3006367728
dd: erreur de lecture « /dev/sdb »: Erreur d'entrée/sortie
0+0 enregistrements lus
0+0 enregistrements écrits
0 octet (0 B) copié, 2,88819 s, 0,0 kB/s
Je pensais alors utiliser dd pour forcer la réécriture de ce bloc, mais ça m’a juste permis de me rendre compte que l’écriture sur ce secteur est elle aussi impossible :
root@HAL9000:~# dd if=/dev/zero of=/dev/sdb obs=512 count=1 seek=3006367728
dd: écriture vers « /dev/sdb »: Erreur d'entrée/sortie
1+0 enregistrements lus
0+0 enregistrements écrits
0 octet (0 B) copié, 2,89941 s, 0,0 kB/s
La situation ne s’améliore pas, je viens de remarquer que la valeur brute de Seek_Error_Rate est en augmentation constante :
root@HAL9000:~# date && smartctl -A /dev/sdb | grep Seek_Error_Rate
jeudi 18 décembre 2014, 18:26:42 (UTC+0100)
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 64529139867
root@HAL9000:~# date && smartctl -A /dev/sdb | grep Seek_Error_Rate
jeudi 18 décembre 2014, 18:26:45 (UTC+0100)
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 64529139874
root@HAL9000:~# date && smartctl -A /dev/sdb | grep Seek_Error_Rate
jeudi 18 décembre 2014, 18:26:51 (UTC+0100)
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 64529139893
J’en appelle donc à vos conseils, un peu perdu dans ces manipulations qui sont pour moi d’un genre nouveau.