Discussion:
SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0):
(too old to reply)
Hartmann, O.
2017-12-13 15:11:20 UTC
Permalink
On Tue, 12 Dec 2017 14:58:28 -0800
There are a couple of ways you can address this. You'll need to
offline the vdev first. If you've done a smartcrl -t long and if the
test failed, smartcrl -a will tell you which block it had an issue
with. You can use dd, ddrescue or dd_rescue to dd the block over
itself. The drive may rewrite the (weak) block or if it fails to it
will remap it (subsequently showing as reallocated).
Of course there is a risk. If the sector is any of the boot blocks
there is a good chance the server will hang.
The drive is part of a dedicated storage-only pool. The boot drive is a
fast SSD. So I do not care about this - well, to say it more politely:
I do not have to take care of that aspect.
You have to be *absolutely* sure which the bad sector is. And, there
may be more. There is a risk of data loss.
I've used this technique many times. Most times it works perfectly.
Other times the affected file is lost but the rest of the file system
is recovered. And again there is always the risk.
Replace the disk immediately if you experience a growing succession
of pending sectors. Otherwise replace the disk at your earliest
convenience.
The ZFS scrubbing of the volume ended this morning, leaving the pool in
a healthy state. After reboot, there was no sign of CAM errors again.

But there is something else I'm worried about. The mainboard I use is a

ASRock Z77 Pro4-M.
The board has a cripple Intel MCP with 6 SATA ports from the chipset,
two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
6GB ports:

[...]
***@pci0:2:0:0: class=0x010601 card=0x06121849 chip=0x06121b21
rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
device = 'ASM1062 Serial ATA Controller'
class = mass storage
subclass = SATA
bar [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
bar [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
bar [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
bar [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
bar [24] = type Memory, range 32, base 0xf7b00000, size 512,
enabled
[...]

Attached to that ASM1062 SATA chip, is a backup drive via eSATA
connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
and it is online, I experience problems on the ZFS pool, which is
attached to the MCP SATA ports.

Is this possible? I mean, as I asked before, a weird/defect cabling
would trigger different error schemes (CRC errors). Due to the fact
that the external drive is physically decoupled and is not capable of
coupling in vibrations, bad sector errors seem to me unlikely. But this
is simply a though of someone without special knowledge about physics
of HDDs.

I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
If using a zfs mirror (not in your case) detatch and attach will
rewrite any weakly written sectors and reallocate pending sectors.
---
Sent using a tiny phone keyboard.
Apologies for any typos and autocorrect.
Also, this old phone only supports top post. Apologies.
Cy Schubert
The need of the many outweighs the greed of the few.
---
-----Original Message-----
From: O. Hartmann
Sent: 12/12/2017 14:19
To: Rodney W. Grimes
Cc: O. Hartmann; FreeBSD CURRENT; Freddie Cash; Alan Somers
(ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error
Am Tue, 12 Dec 2017 10:52:27 -0800 (PST)
Thank you for answering that fast!
Hello,
running CURRENT (recent r326769), I realised that smartmond sends
[...]
Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1
Currently unreadable (pending) sectors Dec 12 14:14:33 <3.2> box1
smartd[68426]: Device: /dev/ada6, 1 Offline uncorrectable sectors
[...]
Checking the drive's SMART log with smartctl (it is one of four
[... smartctl -x /dev/ada6 ...]
Error 42 [17] occurred at disk power-on lifetime: 25335 hours
(1055 days + 15 hours) When the command that caused the error
occurred, the device was active or idle.
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 c2 7a 72 98 40 00 Error: UNC at LBA =
0xc27a7298 = 3262804632
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name -- == -- == -- == == == -- -- -- -- --
--------------- -------------------- 60 00 b0 00 88 00 00 c2 7a
73 20 40 08 23:38:12.195 READ FPDMA QUEUED 60 00 b0 00 80 00
00 c2 7a 72 70 40 08 23:38:12.195 READ FPDMA QUEUED 2f 00 00
00 01 00 00 00 00 00 10 40 08 23:38:12.195 READ LOG EXT 60
00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343 READ FPDMA
QUEUED 60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343
READ FPDMA QUEUED [...]
and
[...]
SMART Attributes Data Structure revision number: 16
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL
RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051
- 64 3 Spin_Up_Time POS--K 178 170 021
- 6075 4 Start_Stop_Count -O--CK 098 098 000
- 2406 5 Reallocated_Sector_Ct PO--CK 200 200 140
- 0 7 Seek_Error_Rate -OSR-K 200 200 000 -
0 9 Power_On_Hours -O--CK 066 066 000 - 25339
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 098 098 000 - 2404
192 Power-Off_Retract_Count -O--CK 200 200 000 - 154
193 Load_Cycle_Count -O--CK 001 001 000 -
2055746 194 Temperature_Celsius -O---K 122 109 000
- 28 196 Reallocated_Event_Count -O--CK 200 200 000
- 0 197 Current_Pending_Sector -O--CK 200 200 000
- 1 198 Offline_Uncorrectable ----CK 200 200 000
- 1 199 UDMA_CRC_Error_Count -O--CK 200 200 000
- 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000
- 5 ||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
[...]
The data up to this point informs us that you have 1 bad sector
on a 3TB drive, that is actually an expected event given the data
error rate on this stuff is such that your gona have these now
and again.
Given you have 1 single event I would not suspect that this drive
is dying, but it would be prudent to prepare for that possibility.
Hello.
Well, I copied simply "one single event" that has been logged so far.
As you (and I) can see, it is error #42. After I posted here, a
reboot has taken place because the "repair" process on the Pool
suddenly increased time and now I'm with error #47, but
interestingly, it is a new block that is damaged, but the SMART
[...]
SMART Attributes Data Structure revision number: 16
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 69
3 Spin_Up_Time POS--K 178 170 021 - 6075
4 Start_Stop_Count -O--CK 098 098 000 - 2406
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 066 066 000 - 25343
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 098 098 000 - 2404
192 Power-Off_Retract_Count -O--CK 200 200 000 - 154
193 Load_Cycle_Count -O--CK 001 001 000 - 2055746
194 Temperature_Celsius -O---K 122 109 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 200 200 000 - 1
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 5
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
[...]
197 Current_Pending_Sector decreased to zero so far, but with every
[...]
Error 47 [22] occurred at disk power-on lifetime: 25343 hours (1055
days + 23 hours) When the command that caused the error occurred, the
device was active or idle.
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 c2 19 d9 88 40 00 Error: UNC at LBA =
0xc219d988 = 3256473992
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name -- == -- == -- == == == -- -- -- -- --
--------------- -------------------- 60 00 b0 00 d0 00 00 c2 19 da
28 40 08 1d+07:12:34.336 READ FPDMA QUEUED 60 00 b0 00 c8 00 00 c2
19 d9 78 40 08 1d+07:12:34.336 READ FPDMA QUEUED 2f 00 00 00 01 00
00 00 00 00 10 40 08 1d+07:12:34.336 READ LOG EXT 60 00 b0 00 b8 00
00 c2 19 da 28 40 08 1d+07:12:31.484 READ FPDMA QUEUED 60 00 b0 00
b0 00 00 c2 19 d9 78 40 08 1d+07:12:31.483 READ FPDMA QUEUED
I think this is watching a HDD dying, isn't it?
I'd say, a broken cabling would produce different errors, wouldn't it?
The Western Digital Green series HDD is a useful fellow when the HDD
is used as a single drive. I think there might be an issue with
paring 4 HDDs, 3 of them "GREEN", in a RAIDZ and physically sitting
next to each other. Maybe it is time to replace them one by one ...
The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one
WD RED 3 TB HDD. The failure occured is on one of the WD Green 3
TB HDD.
Ok, so the data is redundantly protected. This helps a lot.
The pool is marked as "resilvered" - I do scrubbing on a regular
basis and the "resilvering" message has now aapeared the second
time in row. Searching the net recommend on SMART attribute 197
errors, in my case it is one, and in combination with the
problems occured that I should replace the disk.
It is probably putting the RAIDZ in that state as the scrub is
finding a block it can not read.
Well, here comes the problem. The box is comprised from
"electronical waste" made by ASRock - it is a Socket
1150/IvyBridge board, which has its last Firmware/BIOS update got
in 2013 and since then UEFI booting FreeBSD from a HDD isn't
possible (just to indicate that I'm aware of having issues with
crap, but that is some other issue right now). The board's SATA
connectors are all populated.
So: Due to the lack of adequate backup space I can only
selectively backup portions, most of the space is occupied by
scientific modelling data, which I had worked on. So backup
exists! In one way or the other. My concern is how to replace the
faulty HDD! Most HowTo's indicate a replacement disk being
prepared and then "replaced" via ZFS's replace command. This
isn't applicable here.
Question: is it possible to simply pull the faulty disk (implies
I know exactly which one to pull!) and then prepare and add the
replacement HDD and let the system do its job resilvering the
pool?
That may work, but I think I have a simpler solution.
Next question is: I'm about to replace the 3 TB HDD with a more
recent and modern 4 TB HDD (WD RED 4TB). I'm aware of the fact
that I can only use 3 TB as the other disks are 3 TB, but I'd
like to know whether FreeBSD's ZFS is capable of handling it?
Someone else?
This is the first time I have issues with ZFS and a faulty drive,
so if some of my questions sound naive, please forgive me.
One thing to try is to see if we can get the drive to fix itself,
first order of business is can you take this server out of
service? If so I would simply try to do a
repeat 100 dd if=/dev/whicheverhdisbad of=/dev/null conv=noerror,
sync iseek=3262804632
That is trying to read that block 100 times, if it successful even
1 time smart should remap the block and you are all done.
Given the fact, that this errorneous block is like a moving target,
it this solution still the favorite one? I'll try, but I already have
the replacement 4 TB HDD at hand.
If that fails we can try to zero the block, there is a risk here,
but raidz should just handle this as a data corruption of a block.
This could possibly lead to data loss, so USE AT YOUR OWN RISK
ASSESMENT. dd if=/dev/zero of=/dev/whateverdrivehasissues bs=512
count=1 oseek=3262804632
I would then be oseek=3256473992, too.
That should forceable overwrite the bad block with 0's, the smart
firmware well see this in the pending list, write the data, read it
back, if successful remove it from the pending list, if failed
reallocate the block and write the 0's to the reallocation and add
1 to the remapped block count.
You might google for "how to fix a pending reallocation"
Thanks in advance,
Oliver
--
O. Hartmann
Kind regards,
Oliver
Rodney W. Grimes
2017-12-13 16:47:53 UTC
Permalink
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
There are a couple of ways you can address this. You'll need to
offline the vdev first. If you've done a smartcrl -t long and if the
test failed, smartcrl -a will tell you which block it had an issue
with. You can use dd, ddrescue or dd_rescue to dd the block over
itself. The drive may rewrite the (weak) block or if it fails to it
will remap it (subsequently showing as reallocated).
Of course there is a risk. If the sector is any of the boot blocks
there is a good chance the server will hang.
The drive is part of a dedicated storage-only pool. The boot drive is a
I do not have to take care of that aspect.
You have to be *absolutely* sure which the bad sector is. And, there
may be more. There is a risk of data loss.
I've used this technique many times. Most times it works perfectly.
Other times the affected file is lost but the rest of the file system
is recovered. And again there is always the risk.
Replace the disk immediately if you experience a growing succession
of pending sectors. Otherwise replace the disk at your earliest
convenience.
The ZFS scrubbing of the volume ended this morning, leaving the pool in
a healthy state. After reboot, there was no sign of CAM errors again.
But there is something else I'm worried about. The mainboard I use is a
ASRock Z77 Pro4-M.
The board has a cripple Intel MCP with 6 SATA ports from the chipset,
two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
[...]
rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
device = 'ASM1062 Serial ATA Controller'
class = mass storage
subclass = SATA
bar [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
bar [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
bar [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
bar [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
bar [24] = type Memory, range 32, base 0xf7b00000, size 512,
enabled
[...]
Attached to that ASM1062 SATA chip, is a backup drive via eSATA
connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
and it is online, I experience problems on the ZFS pool, which is
attached to the MCP SATA ports.
How does this external drive get its power? Are the earth grounds of
both the system and the external drive power supply closely tied
togeather? A plug/unplug event with a slight ground creep can
wreck havioc with device operation.
Post by Hartmann, O.
Is this possible? I mean, as I asked before, a weird/defect cabling
would trigger different error schemes (CRC errors). Due to the fact
that the external drive is physically decoupled and is not capable of
coupling in vibrations, bad sector errors seem to me unlikely. But this
is simply a though of someone without special knowledge about physics
of HDDs.
Even if left cabled, does this drive get powered up/down?
Post by Hartmann, O.
I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
I think someone had an apm command that turns off the head park,
that would do wonders for drive life. On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace. If it was well backed up or easily replaced
my worries would be less.

... 275 lines removes ...
--
Rod Grimes ***@freebsd.org
O. Hartmann
2017-12-13 19:39:08 UTC
Permalink
Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
Post by Rodney W. Grimes
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
There are a couple of ways you can address this. You'll need to
offline the vdev first. If you've done a smartcrl -t long and if the
test failed, smartcrl -a will tell you which block it had an issue
with. You can use dd, ddrescue or dd_rescue to dd the block over
itself. The drive may rewrite the (weak) block or if it fails to it
will remap it (subsequently showing as reallocated).
Of course there is a risk. If the sector is any of the boot blocks
there is a good chance the server will hang.
The drive is part of a dedicated storage-only pool. The boot drive is a
I do not have to take care of that aspect.
You have to be *absolutely* sure which the bad sector is. And, there
may be more. There is a risk of data loss.
I've used this technique many times. Most times it works perfectly.
Other times the affected file is lost but the rest of the file system
is recovered. And again there is always the risk.
Replace the disk immediately if you experience a growing succession
of pending sectors. Otherwise replace the disk at your earliest
convenience.
The ZFS scrubbing of the volume ended this morning, leaving the pool in
a healthy state. After reboot, there was no sign of CAM errors again.
But there is something else I'm worried about. The mainboard I use is a
ASRock Z77 Pro4-M.
The board has a cripple Intel MCP with 6 SATA ports from the chipset,
two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
[...]
rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
device = 'ASM1062 Serial ATA Controller'
class = mass storage
subclass = SATA
bar [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
bar [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
bar [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
bar [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
bar [24] = type Memory, range 32, base 0xf7b00000, size 512,
enabled
[...]
Attached to that ASM1062 SATA chip, is a backup drive via eSATA
connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
and it is online, I experience problems on the ZFS pool, which is
attached to the MCP SATA ports.
How does this external drive get its power? Are the earth grounds of
both the system and the external drive power supply closely tied
togeather? A plug/unplug event with a slight ground creep can
wreck havioc with device operation.
The external drive is housed in a external casing. Its PSU is de facto with the same
"grounding" (earth ground) as the server's PSU, they share the same power plug at its
point were the plug is comeing out of the wall - so to speak.
Post by Rodney W. Grimes
Post by Hartmann, O.
Is this possible? I mean, as I asked before, a weird/defect cabling
would trigger different error schemes (CRC errors). Due to the fact
that the external drive is physically decoupled and is not capable of
coupling in vibrations, bad sector errors seem to me unlikely. But this
is simply a though of someone without special knowledge about physics
of HDDs.
Even if left cabled, does this drive get powered up/down?
The drive is cabled (eSATA) all the time, but is switched off for long times (4 - 8 weeks
or 2 months, it depends, I switch it on for scrubbing or performing backups of important
data).
Post by Rodney W. Grimes
Post by Hartmann, O.
I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
I think someone had an apm command that turns off the head park,
that would do wonders for drive life. On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace. If it was well backed up or easily replaced
my worries would be less.
... 275 lines removes ...
I'm prepared already, as stated, to change the drive(s), one by one.

Hopefully, ZFS is as reliable to me as it has been reliable for others ;-)

Kind regards,

Oliver
--
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten fÃŒr
Werbezwecke oder fÌr die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).
Daniel Kalchev
2017-12-13 22:07:24 UTC
Permalink
Post by O. Hartmann
Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
Post by Rodney W. Grimes
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
There are a couple of ways you can address this. You'll need to
offline the vdev first. If you've done a smartcrl -t long and if the
test failed, smartcrl -a will tell you which block it had an issue
with. You can use dd, ddrescue or dd_rescue to dd the block over
itself. The drive may rewrite the (weak) block or if it fails to it
will remap it (subsequently showing as reallocated).
Of course there is a risk. If the sector is any of the boot blocks
there is a good chance the server will hang.
The drive is part of a dedicated storage-only pool. The boot drive is a
I do not have to take care of that aspect.
You have to be *absolutely* sure which the bad sector is. And, there
may be more. There is a risk of data loss.
I've used this technique many times. Most times it works perfectly.
Other times the affected file is lost but the rest of the file system
is recovered. And again there is always the risk.
Replace the disk immediately if you experience a growing succession
of pending sectors. Otherwise replace the disk at your earliest
convenience.
The ZFS scrubbing of the volume ended this morning, leaving the pool in
a healthy state. After reboot, there was no sign of CAM errors again.
But there is something else I'm worried about. The mainboard I use is a
ASRock Z77 Pro4-M.
The board has a cripple Intel MCP with 6 SATA ports from the chipset,
two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
[...]
rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
device = 'ASM1062 Serial ATA Controller'
class = mass storage
subclass = SATA
bar [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
bar [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
bar [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
bar [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
bar [24] = type Memory, range 32, base 0xf7b00000, size 512,
enabled
[...]
Attached to that ASM1062 SATA chip, is a backup drive via eSATA
connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
and it is online, I experience problems on the ZFS pool, which is
attached to the MCP SATA ports.
How does this external drive get its power? Are the earth grounds of
both the system and the external drive power supply closely tied
togeather? A plug/unplug event with a slight ground creep can
wreck havioc with device operation.
The external drive is housed in a external casing. Its PSU is de facto with the same
"grounding" (earth ground) as the server's PSU, they share the same power plug at its
point were the plug is comeing out of the wall - so to speak.
Most external drive power supplies are not grounded. At least none I ever saw had grounded plugs for the mains cable. Might be, yours has it...

Worth checking anyway.

Daniel
Willem Jan Withagen
2017-12-14 11:05:20 UTC
Permalink
Post by Rodney W. Grimes
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
I think someone had an apm command that turns off the head park,
that would do wonders for drive life. On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace. If it was well backed up or easily replaced
my worries would be less.
WD made their first series of Green disks green by aggressively turning
them into sleep state. Like when few secs there was nog activity they
would park the head, spin it down, and sleep the disk...
Access would need to undo the whole series of command.

This could be reset by writing in one of the disks registers. I remember
doing that for my 1,5G WDs (WD15EADS from 2009). That saved a lot of
startups. I still have 'm around, but only use them for things that are
not valuable at all. Some have died over time, but about half of them
still seem to work without much trouble.

WD used to have a .exe program to actually do this. But that did not
work on later disks. And turning things of on those disks was
impossible/a lot more complex.

This type of disk worked quite a long time in my ZFS setup. Like a few
years, but I turned parking of as soon as there was a lot of turmoil
about this in the community.
Now I using WD reds for small ZFS systems, and WD red Pro for large
private storage servers. Professional server get HGST He disks, a bit
more expensive, but very little fallout.

--WjW
O. Hartmann
2017-12-23 11:25:41 UTC
Permalink
Am Thu, 14 Dec 2017 12:05:20 +0100
Post by Willem Jan Withagen
Post by Rodney W. Grimes
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
I think someone had an apm command that turns off the head park,
that would do wonders for drive life. On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace. If it was well backed up or easily replaced
my worries would be less.
WD made their first series of Green disks green by aggressively turning
them into sleep state. Like when few secs there was nog activity they
would park the head, spin it down, and sleep the disk...
Access would need to undo the whole series of command.
This could be reset by writing in one of the disks registers. I remember
doing that for my 1,5G WDs (WD15EADS from 2009). That saved a lot of
startups. I still have 'm around, but only use them for things that are
not valuable at all. Some have died over time, but about half of them
still seem to work without much trouble.
WD used to have a .exe program to actually do this. But that did not
work on later disks. And turning things of on those disks was
impossible/a lot more complex.
This type of disk worked quite a long time in my ZFS setup. Like a few
years, but I turned parking of as soon as there was a lot of turmoil
about this in the community.
Now I using WD reds for small ZFS systems, and WD red Pro for large
private storage servers. Professional server get HGST He disks, a bit
more expensive, but very little fallout.
--WjW
Hello fellows.

First of all, I managed it over the past week+ to replace all(!) drives with new ones. I
decided to use this time HGST 4TB Deskstar NAS (HGST HDN726040ALE614) instead of WD RED
4TB (WDC WD40EFRX-68N32N0). The one WD RED is about to be replaced in the next days.

Apart from the very long resilvering time (the first drive, the Western Digital WD RED
4TB with 64MB cache and 5400 rpm) took 11 h, all HGST drives, although considered faster
(7200 rpm, 128 MB cache) took 15 - 16 h), everything ran smoothly - except, as mentioned,
the exorbitant times of recovery.

A very interesting point in this story is: as you could see, the WD Caviar Green 3TB
drives suffered from a high "193 Load_Cycle_Count" - almost 85 per hour. When replacing
the drives, I figured out, that one of the four drives was already a Western Digital RED
3TB NAS drive, but investigating its "193 Load_Cycle_Count" revealed, that this drive
also had a unusual high reload count - see "smartctl -x" output attached. It seems, as
you already stated, that the APM feature responsible for this isn't available. The drive
has been purchased Q4/2013.

The HGST drives are very(!) noisy, th ehead movement induces a notable ringing, while the
WD drive(s) are/were really silent. The power consumption of the HGST drives is higher.
But apart from that, I'm disappointed about the fact that WD has also implemented this
"timebomb" Load_Cycle_Count issue.

Thanks a lot for your help and considerations!

Kind regards,
Oliver
--
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten fÃŒr
Werbezwecke oder fÌr die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).
Willem Jan Withagen
2017-12-27 20:34:10 UTC
Permalink
Post by O. Hartmann
Am Thu, 14 Dec 2017 12:05:20 +0100
Post by Willem Jan Withagen
Post by Rodney W. Grimes
Post by Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives.
I think someone had an apm command that turns off the head park,
that would do wonders for drive life. On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace. If it was well backed up or easily replaced
my worries would be less.
WD made their first series of Green disks green by aggressively turning
them into sleep state. Like when few secs there was nog activity they
would park the head, spin it down, and sleep the disk...
Access would need to undo the whole series of command.
This could be reset by writing in one of the disks registers. I remember
doing that for my 1,5G WDs (WD15EADS from 2009). That saved a lot of
startups. I still have 'm around, but only use them for things that are
not valuable at all. Some have died over time, but about half of them
still seem to work without much trouble.
WD used to have a .exe program to actually do this. But that did not
work on later disks. And turning things of on those disks was
impossible/a lot more complex.
This type of disk worked quite a long time in my ZFS setup. Like a few
years, but I turned parking of as soon as there was a lot of turmoil
about this in the community.
Now I using WD reds for small ZFS systems, and WD red Pro for large
private storage servers. Professional server get HGST He disks, a bit
more expensive, but very little fallout.
--WjW
Hello fellows.
First of all, I managed it over the past week+ to replace all(!) drives with new ones. I
decided to use this time HGST 4TB Deskstar NAS (HGST HDN726040ALE614) instead of WD RED
4TB (WDC WD40EFRX-68N32N0). The one WD RED is about to be replaced in the next days.
Apart from the very long resilvering time (the first drive, the Western Digital WD RED
4TB with 64MB cache and 5400 rpm) took 11 h, all HGST drives, although considered faster
(7200 rpm, 128 MB cache) took 15 - 16 h), everything ran smoothly - except, as mentioned,
the exorbitant times of recovery.
A very interesting point in this story is: as you could see, the WD Caviar Green 3TB
drives suffered from a high "193 Load_Cycle_Count" - almost 85 per hour. When replacing
the drives, I figured out, that one of the four drives was already a Western Digital RED
3TB NAS drive, but investigating its "193 Load_Cycle_Count" revealed, that this drive
also had a unusual high reload count - see "smartctl -x" output attached. It seems, as
you already stated, that the APM feature responsible for this isn't available. The drive
has been purchased Q4/2013.
The HGST drives are very(!) noisy, th ehead movement induces a notable ringing, while the
WD drive(s) are/were really silent. The power consumption of the HGST drives is higher.
But apart from that, I'm disappointed about the fact that WD has also implemented this
"timebomb" Load_Cycle_Count issue.
Oliver,

I would think there is something really off at your end...

I have the same type of disks as your RED 3T, and it gives 10
load-cycle_counts in 38258 hours and 28 off-on cycles....
Different model, but same firmware.

--WjW


=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WMC1T4089783
LU WWN Device Id: 5 0014ee 6ae226f02
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Dec 27 21:25:23 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: (38940) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 391) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control
supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 186 178 021 Pre-fail
Always - 5675
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 28
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 048 048 000 Old_age
Always - 38258
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 28
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 17
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always
- 10
194 Temperature_Celsius 0x0022 119 110 000 Old_age Always
- 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
Offline - 0

SMART Error Log Version: 1
No Errors Logged

Loading...