RAID array keeps getting "lost"

HunterjWizzard

Solid State Member
Messages
6
Location
Titan
I have a file server running Windows Server 2003 x64, SP2. It has a 4gb IDE OS drive, then 4x2tb 5900RPM "green" SATA drives in a RAID5 array. The array is on a Silicon Image SIL3124r5 software RAID card. For the past few days, the system continually "looses" the RAID array.

First, I get a number of errors saying:

"The device, \Device\Harddisk1\DR1, is not ready for access yet."

I get dozens of these in the event viewer basically all at the same time, until another error says:

"The device 'SiImage SCSI Disk Device' (SCSI\Disk&Ven_SiImage&Prod_&Rev_0000\5&2a9f5107&0&001000) disappeared from the system without first being prepared for removal."

Finally, I get two Ftdisk errors saying:

"The system failed to flush data to the transaction log. Corruption may occur."

And then I nolonger have access to me 5.45tb RAID array.

The RAID card bios shows all 4 drives are present and appear to be working, however there isnt a whole lot to see in there.

Steps take:

The problem IS hardware. I confirmed this by disconecting the OS drive and adding a second OS drive on which I installed Windows XP x64, and was able to read the array without difficulty... for about 20 minutes, before the exact same problems began to occur on the new hard drive.

I cannot run a scandisk thorough operation because the computer will not "read" the array long enough to complete the check.

If I reboot, I get about 9-30 minutes of drive time before poof. I won't be terribly disapointed if this problem turns out to be unfixable as there is nothing important on the array(everything important has been backed up) but it will cost me a few score hours of ripping and recompiling DVDs(which is never fun), but I thought it would at least be worth asking around a bit.
 
I highly suggest purchasing Blacks, or RE4's for RAID applications. Western Digital Greens will spin down every 3 or 8 seconds of no activity, and park the heads. This causes issues as they do drop out from the array at that point. At a MINIMUM you need to disable the parking ability of these drives using the WD software for them, if the drives will even accept it.

Second, when a Green disk encounters a bad sector, or some error, it takes too long to try and fix the problem. Most times, it is the controller it self that corrects errors, and not just the hard drive. This can cause issues, as the disk is in a WAIT state, while the controller expects it to be operational at that moment.

So as suggested just a moment ago, get better drives, or, at least, drives on the controllers approved list for use. NEVER use a Green drive in RAID. They are cheaper for a REASON.


I have a pair of 1TB greens in RAID 0 on one machine, they do OK, but I have also disabled the frequent head parking, and fully disabled disk spin down, this is dangerous though because it can cause damage to the heads/platter if power is suddenly lost. They haven't dropped out either... But, I also perform frequent disk checks for bad sectors, almost weekly with them both removed from the array. Much more efficient to have faster higher performance drives for these types of applications.
 
Last edited:
I highly suggest purchasing Blacks, or RE4's for RAID applications.

Wonderful suggestion, honestly not that helpful. Yes, I know the drives are less than optimal, I've known that for years. They've also been working fine for years, this isn't exactly a high-priority system(its the machine where I store every single episode of Star Trek ever made so that I don't have to go hunting for DVDs). If I had a seven or eight hundred bucks to sink into upgrading the server with "proper" hard drives, I'd do it.

BUT: even if I did, that wouldn't solve my current problem. That would just give me a shiny new array to fill with disney movies and VHS rips of cartoon shows from the 80s. I'd still be looking at a dead array and hundreds of gigabytes of lost data(the bulk of it is backed up, along with the really important stuff, like Teenage Mutant Ninja Turtles, but I'd still loose quite a bit of accumulated media and have to spend countless weekends and evenings ripping and re-digitizing).

Tell me: you say you take your drives out of the array to do the scandisk, and this works? Its not something I've ever tried so I wouldn't know. If I could disconect the drives and scandisk thorough them independantly, The Internet says that might fix the problem(the problem being errors on the disks, possibly).
 
Try doing this (data loss is possible...) Disable the RAID configuration then reboot your machine. Go into command prompt and tell each disk (You may have to assign a drive letter, do NOT touch any formatting options) to do a chkdsk /r. Reboot. Both drives should go through the chkdsk. Once that is done, shut down, and re-assemble the array EXACTLY as how it once was. It *SHOULD* come back with no issue.

I have only done this on Intel based controllers, there is a very high chance that the data could be destroyed doing it this way.

And sorry, but the solution is to use disks that are meant for these applications, not cheap crap disks. Whether or not it's helpful towards you, isn't my problem honestly since it is the only actual solution.

Just remember, disable the array then re-enable it, don't just take disks out, otherwise the controller will not be happy.
 
And sorry, but the solution is to use disks that are meant for these applications, not cheap crap disks. Whether or not it's helpful towards you, isn't my problem honestly since it is the only actual solution.

This solution would require the use of a time machine to go back and use the correct drives back when I built the server. Even if I bought the correct drives right now, it does not actually SOLVE the current problem: only prevents a reocurrence. Please do not suggest it again(Unless you actually have a time machine you'd be willing to loan me, because that would be great! Mine got struck by a bolt of lightning and sent back to the old west...)

Now, addressing the actual solution you've proposed:

If I were to physically remove the drives from the server(without touching the raid configuration at all) and attach them to another machine entirely(I have several on hand) and do scandisk /r from there, do you think that might work? I'd like to avoid deleting the raid configuration if I can, as I've never seen a raid come back from this.
 
BTW, remember to not power up the machine that has the controller for the array while the disks are in another system, it really wont like that while the disks are gone.
 
Back
Top Bottom