Thursday, October 14, 2010

Servers were down

Well, our Amazon EC2 app server locked up last night and I went to re-launch it ... it failed to start ... for 7 hours!!!  At 2:00am I finally got it back up and running.  This was caused by a combination of things:
     a) our server start up script that installs our software and sets every setting needed by our software had some incorrect stuff,
     b) and Scalr managed to lose my VHost config for our server, so I did not have SSL and mod_rewite was broken. 
I am pretty upset right now.  Our app server has locked up several times in the past couple of months on Amazon and we have no way of telling why.  The only thing we can do is forcefully terminate the instance and re-launch it, which destroys all of the logs :(.
Sorry to all of our Australian customers.

We will probably be moving away from Amazon EC2 and going back to dedicated servers.  Amazon EC2 sounds like the perfect solution for us on paper, but in practice it just isn't going to work.

