Parcages de têtes intempestifs de mon HD

Je re-poste ici un message que déjà posté sur le forum d’ubuntu. Le problème est le même sous debian.


Bonjour,

Sur mon nouveau portable sous ubuntu 9.10 :~$ date && sudo smartctl -a $(mount | sed -n '/\/ /s/[0-9].*//p') | grep 'Cycle\|Power' Sun Mar 28 21:36:05 CEST 2010 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 101 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 206 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 87763
Quelques secondes plus tard : ~$ date && sudo smartctl -a $(mount | sed -n '/\/ /s/[0-9].*//p') | grep 'Cycle\|Power' Sun Mar 28 21:36:33 CEST 2010 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 101 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 206 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 87767
J’ai bien sûr consulté la page suivante : doc.ubuntu-fr.org/laptop_mode#co … u_probleme

Avec les deux méthodes (laptop-mode et hdparm), il n’y a aucune différence au niveau de la fréquence des parcages des têtes de lecture. hdparm -B 254 /dev/sda ne change rien (mon disque est bien localisé en sda) et le laptop-mode bien qu’il soit activé sur batterie (la commande cat /proc/sys/vm/laptop_mode me retourne la valeur 2) n’est pas pris en compte (je spécifie "pas de parcage avant 60 s d’inactivité et j’ai 10 parcage dans les 30 secondes qui suivent).

Comment régler le problème ?
P-S : hdparm -B 254 semble pris en compte mais les parcages continuent : [code]~$ sudo hdparm -I /dev/sda
[sudo] password for pierre:

/dev/sda:

ATA device, with non-removable media
Model Number: SAMSUNG HM250HI
Serial Number: S1RUJ90SB29945
Firmware Revision: 2AC101C4
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
Standards:
Used: unknown (minor revision code 0x0028)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63

CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 488397168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
device size with M = 10241024: 238475 MBytes
device size with M = 1000
1000: 250059 MBytes (250 GB)
cache/buffer size = 8192 KBytes
Form Factor: 2.5 inch
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec’d by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = ?
Advanced power management level: 254
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Idle-Unload when NCQ is active
* NCQ priority information
DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Long Sector Access (AC1)
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
62min for SECURITY ERASE UNIT. 62min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50024e920161cb48
NAA : 5
IEEE OUI : 0024e9
Unique ID : 20161cb48
Checksum: correct
[/code]

En coupant laptop-mode, ça bouge tjs ? Je pense à un problème de config de laptop-mode…

A noter que sur mon ancien portable j’avais des valeurs très agressives, j’avais un nombre de Load_Cycle_Count assez fou et aucun signe de faiblesse… maintenant j’ai peut-être eu de la chance)

Laptop-mode n’est actif que sur batterie ; hors, sur secteur, j’ai la même cadence de parcage que sur batterie.

La commande $ cat /proc/sys/vm/laptop_mode me retourne 0 sur secteur et 2 sur batterie.

/etc/init.d/laptop_mode stop

Pour voir…

Non, même en coupant manuellement laptop-mode, le nombre de parcage continue d’augmenter de manière très rapide.

Essaye

hdparm -M 254 /dev/sda

j’avais déjà essayé. J’ai maintenant la ligne Advanced power management level: 254 Recommended acoustic management value: 254, current value: 254 mais pas de changement au niveau de la fréquence de parcage.

et avec

hdparm -M 255 /dev/sda

EN le desactivant completement, si ca ne fonctionne pas, désinstalle hdparm et peut etre démarrer avec acpi=off

Désinstaller hdparm ne changera rien (ça n’est qu’un intermédiaire), visiblement il passe les paramètres au controleur mais il semble que ça soit un avec le noyau. Tu as essayé avec CD live?

[b]Disk Power Management
From openSUSE

Disk Power Management Configuration[/b]

Create a configuration file to management disk power management:

/etc/pm/config.d/disk

# Configure disk power management settings to ensure both
# long disk life and good power management.
#
# Space delimited list of disk devices this affects.
#
DEVICES_DISK_PM_NAMES="/dev/sda"
#
#
# Power management modes
#
# Powersave mode off
#  Disable APM and spin-down
#
DEVICES_DISK_PM_POWERSAVE_OFF="hdparm -q -B 255 -q -S 0"
#
# Powersave mode on
# Enable APM to conservative 200 and set spin-down for 21 minutes
#
DEVICES_DISK_PM_POWERSAVE_ON="hdparm -q -B 200 -q -S 252"

Note: Your laptop drive can get hot with no power management if you leave your laptop plugged in all the time. You may want to set DEVICES_DISK_PM_POWERSAVE_OFF to a large value, but not disabled completely. If you were going to do this, you might use something like: hdparm -q -B 254 -q -S 242

This means set the least power management, but not off, and spin down the disk after an hour.

Disk Power Management Script

Then create the power management script:

/etc/pm/power.d/disk

#!/bin/bash
. /usr/lib/pm-utils/functions
. /etc/pm/config.d/disk

if test -z "${DEVICES_DISK_PM_NAMES}"; then
        exit 1
fi

case "$1" in
        true)
                echo "**enabled pm for harddisk"
                for DISK_NAME in `echo ${DEVICES_DISK_PM_NAMES}`; do
                        ${DEVICES_DISK_PM_POWERSAVE_ON} ${DISK_NAME}
                done ;;
        false)
                echo "**disabled pm for harddisk"
                for DISK_NAME in `echo ${DEVICES_DISK_PM_NAMES}`; do
                        ${DEVICES_DISK_PM_POWERSAVE_OFF} ${DISK_NAME}
                done ;;
esac

Make the script executable.

chmod +x /etc/pm/power.d/disk

Test the script

You can then test the set up by using the following commands:

pm-powersave true
hdparm -I /dev/sda | grep ‘Advanced Power’

An asterisk next to ‘Advanced Power Management feature set’ means its enabled. Now try this:

pm-powersave false
hdparm -I /dev/sda | grep ‘Advanced Power’

No asterisk means it’s disabled.

These settings are immediately accessible to kpowersave or gnome-power-manager and are used by default when plugging in your power adapter or removing it on a laptop.

Et pour que ça ne recommence pas après une mise-en-veille:
Suspend/Resume handling Script adopted based on Redhat Bugzilla Link:

disksr --> the script is created with a slight modification to adapt the one like suse script for handling suspend/resume and named disksr (sr stands for suspend resume), to be copied into /etc/pm/sleep.d/

create a file called disksr with the following contents

#!/bin/bash
. /usr/lib/pm-utils/functions
. /etc/pm/config.d/disk


if test -z ${DEVICES_DISK_PM_NAMES}; then
        exit 1
fi

case "$1" in
        hibernate|suspend)
                echo "**enabled pm for harddisk"
                for DISK_NAME in `echo ${DEVICES_DISK_PM_NAMES}`; do
                        ${DEVICES_DISK_PM_POWERSAVE_ON} ${DISK_NAME}
                done ;;
        thaw|resume)
                echo "**disabled pm for harddisk"
                for DISK_NAME in `echo ${DEVICES_DISK_PM_NAMES}`; do
                        ${DEVICES_DISK_PM_POWERSAVE_OFF} ${DISK_NAME}
                done ;;
esac

give the file executable permission, chmod +x disksr from the folder where the script is saved. Copy the script to /etc/pm/sleep.d/ using the following command

sudo cp disksr /etc/pm/sleep.d/
sudo chmod +x /etc/pm/sleep.d/disksr

Le problème est que le disque semble ne pas répondre au paramètre donné par hdparm…

Ah! Je n’avais pas vu. Les antibiotiques et les corticoïdes m’ont nettement plus détraqués que la pause de l’implant. C’est un peu le cas de mon netbook. J’ai renoncé à vouloir gérer cela dessus

Pbm de dents?

Pose d’un implant. La racine sur laquelle était posée une couronne s’est cassée.
Au passage: sur certaines distros, il faut remplacer dans les scritps. /usr/lib/pm-utils/functions
Par

Quand on est en x86_64.

Mouais, ça fait 2 ans que j’ai un trou à la place d’une molaire, l’implant me dit moyen.

Il semble que la connexion hdparm controleur se fasse mais le controle prétend avoir désactiver l’ACPI au minimum (254) et pourtant, ça ne change rien. Soit le controleur se leurre (pbm noyau), soit hdparm dialogue ml avec le noyau (curieux tout de même puisqu’il dialogue), soit le disque est particulier…

Je n’entend pas le parcage des têtes (absolument aucun bruit) et il augmente si rapidement que ça pourrait aller dans le sens d’une erreur de contrôleur (bientôt à 100 000 pour 1 mois d’utilisation!).

Je vais essayer de démarrer sur un autre noyau et en live cd.

P-S 1 : avec un autre noyau (le 2.6.31-14-generic au lieu du 2.6.31-20-generic) il n’y a aucun changement.
P-S 2 : sur le live cd d’ubuntu 9.10 64 bits, j’ai toujours le même problème.

Voici encore plus d’infos sur mon disque si ça peut aider : [code] sudo smartctl -a /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright © 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HM250HI
Serial Number: S1RUJ90SB29945
Firmware Version: 2AC101C4
User Capacity: 250 059 350 016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Not recognized. Minor revision code: 0x28
Local Time is: Tue Mar 30 18:20:39 2010 UTC

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (3840) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 64) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 093 093 025 Pre-fail Always - 2290
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 243
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 109
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 10
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 221
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 27
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 061 000 Old_age Always - 32 (Lifetime Min/Max 16/39)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 10
225 Load_Cycle_Count 0x0032 091 091 000 Old_age Always - 92816

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 98 -

2 Extended offline Aborted by host 70% 29 -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[/code] Dans la sortie de cette commande, j’ai remarqué quelque chose d’intéressant : ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. copie du manuel de smartctl : [code] -F TYPE, --firmwarebug=TYPE
Modifies the behavior of smartctl to compensate for some known
and understood device firmware or driver bug. Except ´swapid´,
the arguments to this option are exclusive, so that only the
final option given is used. The valid values are:

          none - Assume that the device firmware obeys the ATA  specifica‐
          tions.   This  is the default, unless the device has presets for
          ´-F´ in the device database (see note below).

          samsung - In some Samsung disks (example: model SV4012H Firmware
          Version:  RM100-08) some of the two- and four-byte quantities in
          the SMART data structures are byte-swapped (relative to the  ATA
          specification).  Enabling this option tells smartctl to evaluate
          these quantities in byte-reversed order.  Some signs  that  your
          disk  needs  this  option are (1) no self-test log printed, even
          though you have run self-tests; (2) very large  numbers  of  ATA
          errors reported in the ATA error log; (3) strange and impossible
          values for the ATA error log timestamps.

          samsung2 - In more recent Samsung disks (firmware revisions end‐
          ing in "-23") the number of ATA errors reported is byte swapped.
          Enabling this option tells smartctl to evaluate this quantity in
          byte-reversed  order. An indication that your Samsung disk needs
          this option is that the self-test log is printed correctly,  but
          there  are a very large number of errors in the SMART error log.
          This is because the error count is byte swapped.   Thus  a  disk
          with  five  errors  (0x0005)  will  appear  to have 20480 errors
          (0x5000).
          samsung3 - Some Samsung disks (at least  SP2514N  with  Firmware
          VF100-37) report a self-test still in progress with 0% remaining
          when the test was already completed. Enabling this option  modi‐
          fies  the  output of the self-test execution status (see options
          ´-c´ or ´-a´ above) accordingly.

          Note that an explicit ´-F´  option  on  the  command  line  will
          over-ride  any  preset  values  for  ´-F´  (see  the ´-P´ option
          below).

          swapid - Fixes byte swapped ATA identify strings  (device  name,
          serial  number,  firmware version) returned by some buggy device
          drivers.

[/code]Cependant, je ne sais pas trop comment utiliser ce -F samsung, j’ai essayer$ sudo smartctl -a /dev/sda -F samsung3 (ou samsung ou samsung2) mais je ne vois pas de différence dans la sortie des commandes que sans le -F.

Pas de solution ? J’attends qu’il meurt pour me le faire rembourser par la garantie ?

Il ne va pas périr pour autant. Si ça se trouve, le parcage fréquent est devenu une norme pour les portables…

C’est réellement ce que je commence à croire, et pas que pour les portables. Mes deux postes (portable et fixe) ont une tendance aux parcages fréquents, j’ai essayé les mêmes manips sans succès (au mieux, j’augmentais légèrement le laps de temps entre deux parcages).

Après pas mal de recherches, si je n’ai pas trouvé de source démontrant l’innocuité dudit parcage, je n’ai pas été convaincu d’une réelle nocivité. J’avais lancé un sujet à ce propos dans PC : viewtopic.php?f=1&t=24972 .

Le fameux seuil de 600000 est du un arrêt de la garantie des constructeurs en cas de pépassement il y a 3 ans. Depuis de l’eau a coulé…