HDD Killers

Server Troubles

Posted on Thursday 22 December 2005

On Friday I came back from Loughborough, then on Saturday I went back and got the rest of my stuff, including my server. On the way back I got dropped off with my server at Antony’s house so that we could do a file swap.

The plan was for me to stay there for one night while we synchronised files. Things never go as planned, and it’s actually pretty rare that we ever have a plan.

It started off pretty well, although somewhat slower than we had anticipated. Antony has got four 200GB HDD’s and a 160GB HDD in his PC, which I think gives him a total of around 900 GiB of space.

The PC’s were networked directly together with a network cross-over cable, with a 1 Gbps connection we weren’t expecting it to take long. Oh how wrong we were. We started off with the small stuff, he had a few films that I didn’t, so we threw those accross first, I just picked out which ones I didn’t have and away they went. We were surprised by how long they took to copy, but we didn’t really think much of it.

Then we started a HUGE copy and paste, I copied all of the TV series from his “TV Series 2” folder, that was pretty much an entire 200GB (186GiB) drive. That took AGES. It was at this point we noticed that the network utilisation was only at 15%. Fuck. We think that it’s limited because my RAID card is in a PCI slot and not a PCI-Express slot which has a lot more bandwidth.

Then when that was done, we started a HUGER copy and paste. Two of his 200GB drives are in a RAID-0 array and therefore he has a partition of 380GiB, from that I needed 354 GiB of stuff. So over it went. SIX HOURS TO COPY. It actually took two hours just to work out how long it was going to take to copy, but when it did, we took this screen shot.

239 minutes remaining.

It finished at about 5AM, then I copied accross some other small files, some music and PC games and stuff. Then Antony DBAN’d all of his 200 GB drives because they were all really fragmented with not enough space left on them to run disk defragmenter.

As the more astute of you may have noticed, this means most of what he had on his PC, was deleted and therefore needed to be copied back accross. Oh dear. As I said, things never quite go to plan for us.

DBANing the drives took quite a long time however we’re not quite sure how long, as we weren’t awake to see it finish. Antony woke up first and re-RAIDed two of the 200GB drives, then formatted them all to be ready for a hefty amount of data.

Then the copying back process started. I had about 205GiB of films, but he only wanted to use one of the 200GB HDD’s for them, and the 200GB only had 186GiB of space on it. It got filled.

Then came TV series, as it turned out, all of the TV series would just fit on the other three 200GB drives with not much room to spare. But holy crap it took a long time to copy. I think it was about 9PMish that we started the final and HUGERIST copy and paste. 376 GiB of TV series down a network cable, I was asleep for most of it though.

That is, I was asleep, until a FUCKING ALARM started going off. Not a car alarm, or a fire alarm, or a burglar alarm, oh no, this was a different type of alarm, because it was coming from my computer. We turned the monitor back on to see what the problem was, but everything seemed to be perfectly fine, the computer was still responding, the copy was still going fine, the network utilisation was still abismally low compared to what it should have been. We weren’t really sure what the problem was, but just to be sure we shut it down and booted it up again.

As it was booting up, the motherboard POSTed fine, then the RAID card BIOS loaded and checked the drives. Oh dear, drive number 1 had fucked up or something. It was showing up as being connected, but it wasn’t part of the array any more for some reason. So we did the only sensible thing, we unplugged it and pretended like it wasn’t a problem. Turns out it wasn’t a problem, such is the magic of RAID-5. Let me explain.

With a RAID-5 setup, you always lose one HDD’s worth of space, seems like a bit of a bad deal, but it’s not, because that drives worth of space is used for parity data. In my setup, I have eight 300GB drives, but instead of having 2400GB of space, I have 2100GB. Here’s a simple explanation of what parity is and why it works, it’s not actually like this in practicality (data is written to drives in 64KiB chunks) however it’s similar. Imagine you have an eight drive setup (seven drives worth of space) and you want to write 7 bits of data to the array. The RAID card would split those 7 bits up and write one to each drive, and also create 1 parity bit and write that to an eigth drive. Now what is this parity bit, well there are actually two types of parity, odd and even. If you use even parity then if you add up all of the values of the data bits and the value is data is odd, the parity bit is a 1, to make the total value of all the bits an even number. If you use odd parity then it will be the reverse of that, and adding up the all of the bits will result in an odd number.

So if you imagine that all of the 7 data bits are 1’s, the value will be 7. An even parity bit will therefore be 1 and an odd parity bit will be 0. Now let’s say that one of the hard drives fails, the RAID card would read data from the remaining seven drives. It would receive six 1’s and an even parity bit of 1, the seven 1’s come to a total of 7, yet it knows that the value should be an even number, therefore the missing data bit should be a 1.

Therefore it’s possible to work out what the missing data is and therefore it’s not a problem if one of the drives goes AWOL. What you’re left with is effectively the same as a RAID-0 array. However it’s not wise to leave the array one drive down, because should for some reason another drive fail, you have lost everything. Perhaps a better solution is to use RAID-6, it will generate two parity bits and you will therefore lose two drives worth of space, however you can then lose two drives and still the server will keep on chuggin’.

Anyway, we started the server up again with only the seven drives plugged in and continued copying stuff accross, in the end the total copied from my PC to his was 791.1 GiB, or if you prefer 849,407,373,458 B.

Once the copying was finished I DBAN’d the 300GB drive that had been fucked up, that took a while, 3h 24m dead on. For some reason it did two passes over the drive so it took longer than it should have, not that we were in a rush by this point. An average transfer rate of 47885 KiB/s to a drive 279.48GiB in size.

When I got home I set the server up again in the study and shoved it under a desk, then I turned it on and told the RAID card to rebuild the array. “How long can this take,” I thought, “all it has to do is write about 140GiB of data to a drive and then it’ll be done.” Oh dear God, it took AGES. And I don’t mean the I-stood-there-and-watched-it-and-it-must-have-taken-at-LEAST-5-minutes kind of ages. I mean the I-started-it-and-checked-on-it-an-hour-later-and-it-was-2%-done kind of ages. I started it at 11:35PM on Monday evening, and when I came back last night at about 9:35 it was done. Judging from the rate it was going, it must have finished just a bit before I got back, which means that in total it took FORTY SIX hours to complete! It would have probably been quicker to wipe my entire array and copy all of the stuff back accross from Antony’s PC for fucks sake.

Anyway, it’s all back to normal now, and everything is fine again. The array is back up to 8 drives and I can continue to be bored when I have almost 1TiB of TV series, films, games and music in the other room just a network cable away.

960GB used, 995GB remaining, 1.91TB total!

PS. File sharing is communism and therefore wrong.


  1.  
    Antony
    Thursday 22nd December 2005 | 5:47 am
     
    Antony's Globally Recognised Avatar

    It is so easy to be bored though, despite all the stuff we have to watch. Don’t ask why because frankly there is no reasonable explanation as to why when you have 200+ films and what seems like a decades worth of TV series (seriously, we have 48 TV shows) to watch you can still be bored out of your mind, just know that it is totally possible.

  2.  
    Thursday 22nd December 2005 | 5:54 am
     
    David's Globally Recognised Avatar

    It’s true, we do. That Pokemon one is because Antony downloaded it for his little brother and I have an inability to delete anything that has been downloaded if it took more than about 5 minutes unless I personally have a backup of it elsewhere.

  3.  
    Antony
    Wednesday 28th December 2005 | 1:18 am
     
    Antony's Globally Recognised Avatar

    Don’t you just love non-sensical errors, like just – i tried to play the film “Cube” in DivX Player (as it wouldn’t work in winamp or Windows Media Player) and it gave me this nice error:
    “StreamSourceFilter::seekFromBegining(6764) out of readLimit range: 0”
    Shame as well, cus its a very good film!

  4.  
    Wednesday 4th January 2006 | 1:21 pm
     
    lilcrazyfuzzy's Globally Recognised Avatar

    wow david, your server if almost full :p (fuller than the last screenshot you posted!)

Sorry, the comment form is closed at this time.