?

Log in

entries friends calendar profile equiraptor.com Previous Previous Next Next
rivaridge.equiraptor.com died today. I'd seen some messages in the… - Equiraptor's Journal
equiraptor
equiraptor
rivaridge.equiraptor.com died today.

I'd seen some messages in the logs the past few months suggesting hard drive problems, but they had previously always occurred when a partition reached 100% full, so I hadn't worried too much. Around 11:00 today, I had troubles reading a file... The read attempt would time out. I checked the logs and the drive use, and the logs were spewing the errors, and no partition was full. Very soon thereafter, any shell that attempted to access the hard drive would just... stall. Nothing ever came back. New shells could no longer be created (have to read from the drive for that), and everything I tend to do needs to read from a hard drive at some point, so I quickly lost all of my open shells.

Work was kind and let me leave a couple of hours early to go deal with it. They're all geeks, they understand. I came home and collected IDE drives. I found an 80G and a 40G Western Digital, and a 60G IBM drive in a pile. I powered off ridge and plugged them into the primary embedded IDE controller one at a time... I was unsure of the health of the IDE controller, as well, and wanted to see if it could even see these drives.

It turns out, it couldn't. It could see the IBM drive (which showed up as a 33G capacity drive), but neither of the Western Digitals. nugget had a Promise IDE RAID controller lying around, so we plugged that into the machine. It could see all three drives, but wouldn't let me create a RAID 1 (no two drives were the same size). Also, it was seeing the 60G as 33G, just like the embedded. So the 80G and 40G have ended up on that controller, with the 60G on the embedded. This may mean I'm more likely to lose the 60G again, but... Oh well. I may not.

I've reinstalled FreeBSD on the machine, upgrading to 6.1 (it had been on 4.10). It's currently building world to head to 6.1-STABLE from 6.1-RELEASE. I had some backups on dazed, though not all of them where as new as I could wish. Still, I haven't lost any needed data. All will be well with ridge in a few days, and I'll be playing with FBSD 6.

I just wish this buildworld would go faster.

Tags: , ,

6 comments | Leave a comment
Comments
From: coffeemanca Date: May 18th, 2006 04:19 am (UTC) (Link)
Poor equi... At least the remaining hardware is OK, my last FreeBSD server that went "Kaboom" took the powersupply and everything plugged into it along with it.

FreeBSD 6 isn't bad at all, I've been running it since then (6.0-RELEASE) without problems. I had some funky-ness that mergemaster didn't take care of, but I don't remember with what, but over all it was just a couple hours to get everything back online (with a generic kernel though).

I think it's awsome that they gave you time off work for it though :)
equiraptor From: equiraptor Date: May 18th, 2006 02:24 pm (UTC) (Link)
Work is very good about these things. As long as all the work gets done, taking a few hours here or there isn't too big a deal.

I am glad it was just the drive (or looks like it was just the drive - I'm still a bit uncertain about that embedded controller), and I really needed to get the extra space. Both /home and /usr on the old install were very, very full.
chrisj04 From: chrisj04 Date: May 18th, 2006 02:14 pm (UTC) (Link)
poor ridge :-(

6.1 has been reasonably stable for me, the only issue I had upgrading was a change it wanted to make to /etc/group, almost wiping the whole file in the process :/

Glad that it's all fixing itself though
equiraptor From: equiraptor Date: May 18th, 2006 02:21 pm (UTC) (Link)
Eeep. That sounds un-fun. I'm not far from rebooting onto the new world/kernel. Oh, the joy. :)
chrisj04 From: chrisj04 Date: May 18th, 2006 03:24 pm (UTC) (Link)
The issue I had was it wanted to add a new group (audit), and decided to try and merge the files... removing all my groups in the process.

Just something to keep track of. That's the only thing I had issues with :)
decibel45 From: decibel45 Date: May 18th, 2006 02:38 pm (UTC) (Link)
HD failures suck ass. :(
6 comments | Leave a comment