Skip to content
 
If you can read this, either the style sheet didn't load or you have an older browser that doesn't support style sheets. Try clearing your browser cache and refreshing the page.

(Fark)   Four HDDs and/or SSDs in two different computers failed or had sporadic I/O issues in the past week. Should I A. check house for radon, B. get an exorcism, C just assume it's a coincidence, or D. replace the motherboards?   (fark.com) divider line
    More: Weird, Hard disk drive, Computer, Serial ATA, main data drive, different computers, Digital audio player, data drives, Failure  
•       •       •

187 clicks; posted to Discussion » on 21 May 2022 at 9:35 PM (6 weeks ago)   |   Favorite    |   share:  Share on Twitter share via Email Share on Facebook



36 Comments     (+0 »)
View Voting Results: Smartest and Funniest
 
2022-05-21 4:55:50 PM  
did you have a storm, brownout, or other electrical issue? are the computers hooked up to a appropriately rated UPS? Did you download porn from a shady website?
 
2022-05-21 5:06:12 PM  
Have you considered becoming Amish?
 
2022-05-21 5:07:21 PM  
This sounds like the mobo being fried.

Likely a power surge, even if you didn't see it.

Buy a UPS as Khit suggested.
 
2022-05-21 5:08:08 PM  
Power issues would be my first guess. A good UPS with power filtering / conditioning sounds like a good investment.
 
2022-05-21 5:20:37 PM  
More info.

Computer one, my main desktop, had its main SSD fail hard a week ago.  (Fortunately just the OS, not my main data drive.)  I powered it down and was using my laptop until I could get a replacement, but I needed to run something on the desktop, so I booted it with an Lubuntu live USB stick.  And because I thought I might need to run it again, I left it on for a few days (didn't need it).

A few days later I powered it on again, and it got hung up in the BIOS and one of the data drives was clicking, so it failed hard too.  Fortunately it turned out to the drive that contains data not of immediate concern.  I powered it off quickly.

Yesterday, I carefully isolated the bad drive and was able to boot and the desktop.  Thinking there might be an electrical problem causing issues, I rather painstakingly copied all my most important and irreplaceable files from my main data to my laptop.  Halfway through the process the drive started getting I/O errors, but was not clicking.  Upon rebooting the drive ran again I was able to complete the copy (several hundred gigabytes) with no further issues.

Meanwhile, in my media computer, I had a hard disk that started producing I/O errors.  The BIOS kept getting hung up in reboot.  I finally managed to get the computer to boot by disabling all the hard drives in the BIOS then reenabling (it's apparently a sporadic failure).  Once I did boot it, I saw that all drives were working, but at some point one started getting I/O errors again.  A heath scan of the questionable drive claimed that it did not support protocol, which I know is false, it's the same model as the other drives.  Replacing cables didn't do much.

So I guess why all these sudden failures in two computers at the same time over a week?  The systems had been working flawlessly.  Would something like an electrical surge cause this?  (That's a possibility, the building was repairing its AC unit and I could easily see powering that thing up and down a few times having electrical implications.)  Apart from the drive failures the computers seem to be working fine.  My laptops were plugged in also, but no failures.

(BTW, yes, I do have cloud backups of everything important, but I consider those a last resort, and would rather have direct living copies.)

/notepad
 
2022-05-21 5:21:31 PM  
Moar deets needed subby.

Are we talking similar hardware bought in the same timeframe or something of different ages?  Same or different models?

What OS are you running?

You check for power issues?
 
2022-05-21 5:22:51 PM  

aerojockey: More info.

Computer one, my main desktop, had its main SSD fail hard a week ago.  (Fortunately just the OS, not my main data drive.)  I powered it down and was using my laptop until I could get a replacement, but I needed to run something on the desktop, so I booted it with an Lubuntu live USB stick.  And because I thought I might need to run it again, I left it on for a few days (didn't need it).

A few days later I powered it on again, and it got hung up in the BIOS and one of the data drives was clicking, so it failed hard too.  Fortunately it turned out to the drive that contains data not of immediate concern.  I powered it off quickly.

Yesterday, I carefully isolated the bad drive and was able to boot and the desktop.  Thinking there might be an electrical problem causing issues, I rather painstakingly copied all my most important and irreplaceable files from my main data to my laptop.  Halfway through the process the drive started getting I/O errors, but was not clicking.  Upon rebooting the drive ran again I was able to complete the copy (several hundred gigabytes) with no further issues.

Meanwhile, in my media computer, I had a hard disk that started producing I/O errors.  The BIOS kept getting hung up in reboot.  I finally managed to get the computer to boot by disabling all the hard drives in the BIOS then reenabling (it's apparently a sporadic failure).  Once I did boot it, I saw that all drives were working, but at some point one started getting I/O errors again.  A heath scan of the questionable drive claimed that it did not support protocol, which I know is false, it's the same model as the other drives.  Replacing cables didn't do much.

So I guess why all these sudden failures in two computers at the same time over a week?  The systems had been working flawlessly.  Would something like an electrical surge cause this?  (That's a possibility, the building was repairing its AC unit and I could easily see powering that thing up and down a few times ha ...


Wait, are these HDDs all the same model or different sizes but the same family in the same manufacturing batch?
 
2022-05-21 5:27:29 PM  

null: Wait, are these HDDs all the same model or different sizes but the same family in the same manufacturing batch?


The media computer had three HDDs of the same model, or close to it. Same disk layout and so on, similar or same product number.  (I'm not sure and don't think they are the same batch.)  I did not have them in RAID.  Notably, only one of the drives in that computer had issues.  None of the other HDDs or SDDs in either system are related.  Not thinking it's a bad batch, no.
 
2022-05-21 5:29:52 PM  
Also the two desktop systems that had issue ran Linux and were powered on continuously.  The two laptops that had no issues run Windows are and cycled on and off as per needed, but usually I leave them plugged in and running.  Make of that what you will.
 
2022-05-21 5:32:32 PM  

aerojockey: The systems had been working flawlessly.  Would something like an electrical surge cause this?  (That's a possibility, the building was repairing its AC unit and I could easily see powering that thing up and down a few times having electrical implications.)


Yes absolutely! and if it was not a large surge but your more typical dissipative surge (https://shedheads.net/whole-house-surge-protectors/types/)that would easily damage components and bypass your standard surge strip. That's why you need a UPS for sensitive electronics rather than just a surge protector.
 
2022-05-21 5:34:59 PM  
I'm guessing electrical anomalies made some of the drives wonky; guessing a surge or something, since there was not a blackout.  My PC power supplies are pretty good, as far as I know.  At least, they're not cheap.

I still don't know if it's motherboard or drives but since the computers work fine otherwise, and since storage generally seem to be more sensitive in general, I probably need to replace the bunch.  And, I suppose, invest in a good UPS.
 
2022-05-21 5:35:17 PM  

aerojockey: null: Wait, are these HDDs all the same model or different sizes but the same family in the same manufacturing batch?

The media computer had three HDDs of the same model, or close to it. Same disk layout and so on, similar or same product number.  (I'm not sure and don't think they are the same batch.)  I did not have them in RAID.  Notably, only one of the drives in that computer had issues.  None of the other HDDs or SDDs in either system are related.  Not thinking it's a bad batch, no.


Is the spinning rust in your main computer similar to the HDDs in your media computer?  I can tell you that I've seen a >50% failure rate on at least one make/model of HDD, they were all in Dell Optiplex 780s purchased at the same time and were the same model and batch.

Given Linux I might also question what distro you are running on both, and if same, did they get a kernel update recently that broke things?
 
2022-05-21 5:44:22 PM  

null: Is the spinning rust in your main computer similar to the HDDs in your media computer?


I really doubt that's it.  The drives were all at least a year old, I believe some are more than four years old, most were purchased at different times, and they all failed or got wonky in the same week.

The media computer definitely did not have a recent kernel, my work desktop gets regular and timely updates.  I don't think that's it either.  Also if a Linux kernel was causing hard disks to fail I am quite sure we'd know by now.
 
2022-05-21 5:47:04 PM  

null: aerojockey: null: Wait, are these HDDs all the same model or different sizes but the same family in the same manufacturing batch?

The media computer had three HDDs of the same model, or close to it. Same disk layout and so on, similar or same product number.  (I'm not sure and don't think they are the same batch.)  I did not have them in RAID.  Notably, only one of the drives in that computer had issues.  None of the other HDDs or SDDs in either system are related.  Not thinking it's a bad batch, no.

Is the spinning rust in your main computer similar to the HDDs in your media computer?  I can tell you that I've seen a >50% failure rate on at least one make/model of HDD, they were all in Dell Optiplex 780s purchased at the same time and were the same model and batch.

<snip>


^^ This - used to work in a data center / hosted server provider and we would see similar issues with hard drives from a 'certain' manufacturer who's first initial was 'Seagate' in Dell servers.

500 GB SATA drives (don't ask) had a high failure rate but 600 GB SAS were fine.  750 GB - a six drive server in RAID 5 had at least 3 drives replaced with 1 TB SATA drives.

It was like the 'S' company would have reliable drives for one size but forget the recipe and the next larger sizes were duds and then find it again.
 
2022-05-21 5:49:12 PM  

aerojockey: null: Is the spinning rust in your main computer similar to the HDDs in your media computer?

I really doubt that's it.  The drives were all at least a year old, I believe some are more than four years old, most were purchased at different times, and they all failed or got wonky in the same week.

The media computer definitely did not have a recent kernel, my work desktop gets regular and timely updates.  I don't think that's it either.  Also if a Linux kernel was causing hard disks to fail I am quite sure we'd know by now.


I mean I have heard of kernel/driver issues causing data corruption and also depending on things if there's not proper wear-leveling then one part of the drive may get worn out.  And that will hit SSDs too if there's too much writing to them.  I've even been hit with two drives I bought at the same time of the same make and model dying at approximately the same time.
 
2022-05-21 6:01:40 PM  

null: I mean I have heard of kernel/driver issues causing data corruption


This definitely wasn't data corruption.  Wear leveling I'd expect would only start causing issues if it happens over an extended period of time and it'd be unlikely for two drives under very different usage conditions to happen to fail for this reason in the same week, and frankly, I don't think Linux has these issues.  I need to put a stop to this.

I'm going to assume electrical disruptions caused the HDD/SDD issues, and replace all my storage.  I may repurpose some of the surviving drives as a secondary in-house backup system.

Thanks all for any knowledge shared.
 
2022-05-21 6:18:12 PM  
I'm wondering if I should replace the PCUs as well.  Or maybe that's the main issue .  I guess I wouldn't expect the PCU to be something that is damaged as easily, so if the computer still works okay it was probably a one-time event that got through the PCU and had effects on the disks.  I'd get out my DMM but I kind of doubt PCU damage could be diagnosed with just a DMM.
 
2022-05-21 6:39:29 PM  
D.

I had the same problem with a NAS. Random write errors after hours of hard use (big backup)
I swapped the disks, but same kind of errors.

Removed the lid to improve cooling. No errors.
I interpreted that as some component on the motherboard was overheating.

Bought new NAS, and kept the disks. Problem solved.
 
jbc [TotalFark]
2022-05-21 7:03:44 PM  
You probably got malware from looking at Sri Lankan amputee furry porn.
 
2022-05-21 7:05:15 PM  
external-content.duckduckgo.comView Full Size
 
2022-05-21 7:32:09 PM  
The electrical grid is cravulating everywhere.  Brownouts are more common, whether you see them or not; have you noticed your clocks defaulting to 12:00?  And today, the first hot day, a lot of air conditioners kicked in.  So power is crappier during the summer.  If you have iffy power, yes, you need to crib in a UPS for your electronics.  Also, surge suppressors go over to the enemy after five years, so replace them.  Plugging into GFI outlets also helpful.
 
2022-05-21 7:58:14 PM  

claytonemery: Brownouts are more common, whether you see them or not; have you noticed your clocks defaulting to 12:00?


Definitely did not have a brownout sufficient to reset clocks.  The media computer might have had an uptime of 100 days when I first noticed I/O errors.  If it was electrical it was more subtle than that, and not enough of an interruption to lose power.  I don't have iffy power (I don't believe I've had a single brownout or blackout in the years I've lived here, in fact) but possibilities happen.
 
2022-05-21 8:13:27 PM  

GitOffaMyLawn: Power issues would be my first guess. A good UPS with power filtering / conditioning sounds like a good investment.


It's almost always a power problem.
 
2022-05-21 9:37:09 PM  

Marcus Aurelius: GitOffaMyLawn: Power issues would be my first guess. A good UPS with power filtering / conditioning sounds like a good investment.

It's almost always a power problem.


Go into your main breaker box and use a screw driver to tighten down all the neutrals, ground, and hot wires. A loose neutral wire will cause extra high voltage on 2 shared hot wires. Think 240 volts on a 120 volt outlet. Also please do not die doing this. Do not be grounded and focus on not shorting the wiring. I have done this hundreds of times. Or you could just kill the main and use a flash light to perform the operation like any normal other person would. I put in a surge protector from Loews on my main breaker box and found my main ground wire from the street loose that was causing problems...lightning hits nearby causing surge problems. Damn I wish I had checked the breaker box before I moved in.
 
2022-05-21 10:30:28 PM  
Has the AC been running since the troubles start? The AC draws from both phases and if there is a problem in the AC (or the box) when the compressor starts some weird voltages can get sent elsewhere in the home. As previously mentioned check to make sure if anything is tight in the box and then think if you have seen any voltage dimming when the AC comes in.
 
2022-05-21 10:34:37 PM  
bad controller
 
2022-05-22 12:37:00 AM  

khitsicker: are the computers hooked up to a appropriately rated UPS?


This.

Even when I lived in the "best" parts of town, I quickly learned to use a UPS for any expensive electronic equipment.  Surge protectors don't cut it, it's gotta be a UPS, even a little one.

/well, maybe not so quickly
//it took one computer going thru 4 power supplies before I clued in
 
2022-05-22 1:16:26 AM  
Could be electrical could be the motherboard could be a bad connection to the storage. If you wanna get nerdy break out a multimeter with an amp clamp throat on everything touching or feeding your electrical components while they're on underload and see what's up. If your power supply is rated for X wattage and you have high amperage then you have a low voltage but if you have low amperage that means you have high voltage. But honestly the first thing I check Are all the connections from your main all the way up to the connector powering the storage.
 
2022-05-22 1:25:05 AM  
Thinking about it I bet your smoothing capacitor inside your power supply most likely has popped. Take out the power supply open up the case look inside if you see capacitors with the tops busted open or bulging the most likely your power supply is farked.
 
2022-05-22 2:50:02 AM  
How old is your hardware?  Power cycles and up/down cycles is a big clue.
 
2022-05-22 4:11:16 AM  

aerojockey: null: Wait, are these HDDs all the same model or different sizes but the same family in the same manufacturing batch?

The media computer had three HDDs of the same model, or close to it. Same disk layout and so on, similar or same product number.  (I'm not sure and don't think they are the same batch.)  I did not have them in RAID.  Notably, only one of the drives in that computer had issues.  None of the other HDDs or SDDs in either system are related.  Not thinking it's a bad batch, no.


Why dismiss a manufacturer error offhand? Without telling us how old the drives are, thats a stupid assumption to make because factory failures wont always be right off the bat. It could take a year or more for them to fail. We cannot provide nearly enough solid help without knowing the actual make/model of each drive that failed and how old they each are
 
2022-05-22 5:40:49 AM  
even if your PSUs have PFC, still like ever'bunny else said above, get thee to some farking UPS, stat!

I've got 2 sine-wave ones, for the FiOS rooter and the media server & tv & my "hardline" voip phones & some switches in between, mostly to just balance things when there's an outage, but yeah really you cannot trust finicky hardware. not spinning abominations, not solid state monstrosities.

good or bad, can't stop the signal. so better try to make it good.
 
2022-05-22 6:24:59 AM  

ImmutableTenderloin: Thinking about it I bet your smoothing capacitor inside your power supply most likely has popped. Take out the power supply open up the case look inside if you see capacitors with the tops busted open or bulging the most likely your power supply is farked.


Yeah probably closer to the problem.

If you've got a 'scope check the plus 5's and 12's for excessive noise.

What was to old ripple spec on AT/ATX? 25mv p-p?

I take it the SATA bus has all sorts of current limiting
 
2022-05-22 7:58:58 AM  
then I saw two different boxes
it's a grounding issue
 
2022-05-22 10:38:16 AM  

cretinbob: then I saw two different boxes
it's a grounding issue


If you have 2 breaker boxes then the second box MUST have a isolated neutral and isolated ground.
 
2022-05-22 10:55:05 PM  
I haven't seen much suggestion of it, but your PSU is likely dying. If your drives power on and off rapidly (particularly for spinning media) it's not great for it. 

It's also possible that something else is shorting the board, which is worse. Check USB devices by removing them all.

PSU replacement (if you have an extra / old one all the better to test with).
UPS for sure (as others have said).
 
Displayed 36 of 36 comments

View Voting Results: Smartest and Funniest

This thread is closed to new comments.

Continue Farking




On Twitter


  1. Links are submitted by members of the Fark community.

  2. When community members submit a link, they also write a custom headline for the story.

  3. Other Farkers comment on the links. This is the number of comments. Click here to read them.

  4. Click here to submit a link.