Today's outage

Sharky

Staff
Messages
5,946
Likes
611
As anyone who tried accessing the site today will have noticed, the site was down since about midnight last night. Basically the primary hard disk on our server died (or at least crashed and was showing signs of dying) and we ended up having to replace it with a brand new hard disk, reinstall the o/s and copy all the data (256mb worth) off the old hard disk and reinstall all the software. As you might imagine this was no small task and it took us the best part of the day to do it.

A big thank you to everyone for your patience today whilst we restored everything and thankfully nothing was lost in the process. We did our best to keep you informed on Twitter and Facebook.

If you do spot any glitches though do report them here.
 
no worries .............anyone running websites will know the problems and pain involved with doing it !!
 
S - I cant load Jpegs ?? .....is there an issue ?
 
256GB :) Think it was a looong day yesterday

The issues with uploading, I'll get it fixed this morning.. it's probably to do with the permissions on the upload directories. Thanks for the heads-up everyone!
 
Image attachments now working...
 

Attachments

  • download.jpeg
    download.jpeg
    8.6 KB · Views: 210
My 2¢

As anyone who tried accessing the site today will have noticed, the site was down since about midnight last night. Basically the primary hard disk on our server died (or at least crashed and was showing signs of dying) and we ended up having to replace it with a brand new hard disk, reinstall the o/s and copy all the data (256mb worth) off the old hard disk and reinstall all the software. As you might imagine this was no small task and it took us the best part of the day to do it.

A big thank you to everyone for your patience today whilst we restored everything and thankfully nothing was lost in the process. We did our best to keep you informed on Twitter and Facebook.

If you do spot any glitches though do report them here.

Good work, but 256GB isn’t really a lot of data nowadays, especially when you consider the capacity of smartphones and tablets. The hard drive in my laptop is 500GB and I do regular back up and restores.

Doesn't your system run a RAID? I am baffled how a hard drive failure could be so catastrophic to a modern server...unless you are still running Windows 3.1 on a 386 :LOL:
 
You're absolutely right, the recovery shouldn't take that long. As with most things in life, it didn't quite go to plan. :/

Suffice to say lessons learned and changes made to minimise, if not eliminate, downtime for this sort of event.

Perhaps worth considering our own setups and whether we've got the necessary backups and redundancies in place ourselves.
 
Sharky, you should look into using hadoop as your file store. You can buy commodity hardware and the whole thing is designed around expectations of failure. You don't need expensive licences (it's open source) and the same goes for the operating systems (can use open source Linux)
 
The whole design is also easily scaled. Just add more Web servers or hadoop servers as you need
 
Actually, I'm quite impressed. For a site that gets no income from its viewers you've not done badly to get up and running again within a few hours. I know of one motorcar forum site a few years ago that hadn't got proper backups and it never recovered and is now extinct. Of course, you could always spend loads of dosh on very sophisticated recovery systems but based on today's performance you don't need to.


Well done Sharky.
 
Actually, I'm quite impressed. For a site that gets no income from its viewers you've not done badly to get up and running again within a few hours. I know of one motorcar forum site a few years ago that hadn't got proper backups and it never recovered and is now extinct. Of course, you could always spend loads of dosh on very sophisticated recovery systems but based on today's performance you don't need to.


Well done Sharky.

You can buy a very basic RAID 0 for less than £250. You can buy brand new terabyte drives for under £50. Anyone who thinks a back-up system is sophisticated and expensive doesn’t have much experience with computer hardware and software. I ran a hobby business in my spare time which had complete redundancy and backup.

I can't believe a site like this could be offline for so long because of a simple hard drive failure. I thought it was a major network issue (like a cut fibre) or something out of the site owners control.

Computer stuff is dirt cheap, this isn't the 1950's.
 
Raid 0 isn't redundant it's stripped and if you lose a drive you lose data
 
You can buy a very basic RAID 0 for less than £250. You can buy brand new terabyte drives for under £50. Anyone who thinks a back-up system is sophisticated and expensive doesn’t have much experience with computer hardware and software. I ran a hobby business in my spare time which had complete redundancy and backup.

I can't believe a site like this could be offline for so long because of a simple hard drive failure. I thought it was a major network issue (like a cut fibre) or something out of the site owners control.

Computer stuff is dirt cheap, this isn't the 1950's.


I understand what you're saying - I used to be a network critical IT manager in a large government agency. My simple point is that what Sharky's got at the moment is perfectly adequate. If T2W goes down it's hardly a matter of life and death is it?
 
Top