Erreurs SATA

Bonjour,

Depuis quelques jours, j’ai ces erreurs dans dmesg sous ma wheezy:

Jan 31 00:15:46 Kerberos kernel: [430293.180041] sr 4:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jan 31 00:15:46 Kerberos kernel: [430293.180066] ata5: hard resetting link
Jan 31 00:15:46 Kerberos kernel: [430293.496589] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 31 00:15:46 Kerberos kernel: [430293.499170] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
Jan 31 00:15:46 Kerberos kernel: [430293.499180] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node ffff88040e095fb0), AE_NOT_FOUND (20110623/psparse-536)
Jan 31 00:15:46 Kerberos kernel: [430293.500854] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
Jan 31 00:15:46 Kerberos kernel: [430293.500863] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node ffff88040e095fb0), AE_NOT_FOUND (20110623/psparse-536)
Jan 31 00:15:46 Kerberos kernel: [430293.500881] ata5.00: configured for UDMA/100
Jan 31 00:15:46 Kerberos kernel: [430293.516594] ata5: EH complete

Après quelques recherches, je tombe sur des vieux post parlant d’un bug kernel… Ca parle à quelqu’un?

Merci

1er chose a faire => revoir le branchement de tous les câbles de la machine

Bonjour,

Il y a un bogue qui traîne de mémoire sur la négociation de la vitesse du port sata, mal transmis au noyau par des bios un peu moisis, et entraînant ensuite des freeze du lien sata avant hardreset.

J’avais ça sur mon ancien portable, en général un reboot et ça fonctionnait mieux, parfois 2 pour ne plus avoir le problème, qui n’apparaissait pas à chaque démarrage.

Par contre je n’avais pas d’erreur ACPI.

En complément de ce que dit Mimoza, à tout hasard regarde aussi l’état smart de ton disque dur … et sauvegarde ! :mrgreen:

Usti

J"ai fait ce que vous m’avez dis, j’ai toujours ces erreurs.

un smarctl sur /dev/sda me donne:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       227637195
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail  Always       -       14276710
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5272
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       36
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       40
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   061   052   045    Old_age   Always       -       39 (Min/Max 39/39)
194 Temperature_Celsius     0x0022   039   048   000    Old_age   Always       -       39 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   032   024   000    Old_age   Always       -       227637195
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       44259637990630
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       13835525
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       201618405

et sur sdb

MART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       65538
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       87
  3 Spin_Up_Time            0x0007   136   136   024    Pre-fail  Always       -       411 (Average 430)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       190
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       26
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       12335
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       79
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       217
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       217
194 Temperature_Celsius     0x0002   142   142   000    Old_age   Always       -       42 (Min/Max 16/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

La valeur du Hardware_ECC_Recovered me fesait un peu peur, surtout qu’elle augmente de 1000 toutes les 10 minutes environ, mais ca ne semble pas important d’après mes recherches. Je pense à un bug…

[quote=“bloodaxe70”] SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 227637195 [...] 7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 14276710 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5272 [...] 195 Hardware_ECC_Recovered 0x001a 032 024 000 Old_age Always - 227637195 [...] 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 44259637990630 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 13835525 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 201618405 [/quote]

Il va falloir que tu nous en disent un peux plus sur tes disque et surtout leur historique.
Le sda semble en bien mauvaise posture et ses chiffres me semblent vraiment très élevés.
La page Wikipedia est bien fait avec les valeurs attendus de chaque attributs. Pour moi ton sda va te lâcher dans pas longtemps, colle ton oreille dessus pour voir s’il ne fait pas de “clac” … et sauvegarde son contenue :wink:

Le sda n’a que 6 mois et sdb un peu plus d’un an.
Depuis le reboot je n’ai eu que quelque fois les erreurs, ca s’est calmé aujourd’hui.

smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST31000524AS
Serial Number:    6VPJS9VX
LU WWN Device Id: 5 000c50 04a9f0416
Firmware Version: JC4B
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Feb  1 18:27:46 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  600) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 177) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x103f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail  Always       -       27902769
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       36
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail  Always       -       14342840
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5296
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       36
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       40
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   060   052   045    Old_age   Always       -       40 (Min/Max 39/41)
194 Temperature_Celsius     0x0022   040   048   000    Old_age   Always       -       40 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   035   024   000    Old_age   Always       -       27902769
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       79680233280766
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       15014408
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       385216748

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K3000
Device Model:     Hitachi HDS723020BLA642
Serial Number:    MN1220F31SLMRD
LU WWN Device Id: 5 000cca 369d8d1ff
Firmware Version: MN6OA5C0
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Feb  1 18:29:38 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(19381) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       86
  3 Spin_Up_Time            0x0007   136   136   024    Pre-fail  Always       -       411 (Average 430)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       190
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       26
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       12359
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       79
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       217
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       217
194 Temperature_Celsius     0x0002   139   139   000    Old_age   Always       -       43 (Min/Max 16/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Je sauvegarde tout, tous les weeks end, à surveiller de près… :\

PS: Intéressant le lien wikipedia, merci :slightly_smiling:

sda est inquiétant selon la marque du disque …
Mais si c’est un Seagate ce qui est probablement le cas, c’est “normal” !

Tous les Seagate ont ce genre de valeurs très anormales pour un Western, Samsung ou Hitachi.
Fait quand même un scan de la surface du disque, il y a un utilitaire qui s’appelle badblock ou quelque chose comme ça que tu peux lancer en lecture seule et qui test tous les blocks du disque.
Tu verras vite si tu as des erreurs ou non, pour les erreurs de link, essaye de changer le câble Sata si tu en as un sous la main. C’est rare mais ça peut arriver que le câble soit défectueux, j’ai déjà vu.

Par contre aucune idée pour les erreurs ACPI, désolé :frowning: