12 thoughts on “Amazon's EC2 Service Suffered Some Hiccups”

  1. I think it’s irresponsible and alarmist to claim that EC2 “went down.” It was a “set of racks” in a “single availability zone.” The purpose of exposing the ‘availability zone’ concept to developers is to allow them to ensure their own availability even during events such as this.

    The S3 outage broke all use of S3; this was a connectivity loss to a fraction of EC2. The two cannot be compared. Everyone was perfectly able to launch new instances to replace the out-of-commission ones.

    Is an “EC2 is down” GigaOM post to be expected for every AWS status dashboard update? To suggest that EC2 as-a-whole was ‘knocked out’ or in a ‘nose-dive’ is really quite inaccurate.

  2. The trick is to be prepared for an availability zone outage. If your architecture is fault tolerant across availability zones, then you can survive a single lightning strike.

    The problem is what happens if several lightning strikes hit all availability zones. But in this case, they’ve probably deserved it…

  3. Just goes to show you, just because it’s “cloud” doesn’t mean you can ignore the tried and true. Gotta build in redundancy to your systems and not rely on a single provider for everything. Costs more but if you need the uptime, it’s the price you pay. I have a bad feeling a lot of people are going to put a lot of trust into the cloud thinking it’ll be bulletproof which is obviously not the case today.

  4. lighting striking ??? ………….havent they installed some thing called lightning coductutor ….which is being used from 1800

  5. Outages happen, it’s just a part of IT. As Oren and Ken both said it’s important to build redundancy into your system architecture, and the more critical your systems are the more you need to worry about it. I can say from personal experience that the EC2 (and other AWS) systems have been far more reliable than any other hosted systems that I’ve used, including internally hosted environments, and are much easier to grow.

  6. Cloud computing is such a bad departure from one of the underlying concepts of the web that makes it so important and valuable. Yes, global interconnectivity, but always with equal localization of data. I don’t believe that redundancy is necessary byproduct of the design when efficient indexing and compression of localized networked files is utilized. The more we centralize and rely on one company’s servers, the less independent and secure our information becomes, regardless of how this simplifies things. Even the name is bad- they ought to call it SNOWBALLING.

  7. Ah yes, if it bleeds, it leads.

    Don’t worry, this will be forgotten in a month when something else goes bump in the night…

  8. The “Cloud” isn’t magic, people. You still have to architect a fault-tolerant system based on the tools that Amazon gives you. Anybody who just fires up an EC2 instance and thinks that their work is done is sadly mistaken.

  9. I manage about 100 servers on Amazon EC2, and lost one in this latest outage. We failed pretty quickly to a backup server. We run several such backups, and will even fail out of the cloud to our managed POPs if it requires it.

    As long as the instances you run are tied to a single point of failure – the bare metal – stuff like this is bound to occur. I didn’t have much better luck running my own boxes in the past, but the difference is now I can spin up a new instance to take its place in a matter of minutes, not hour or days. This is exactly what I did yesterday.

    @dirky and other naysayers – clearly you haven’t managed a significant amount of metal in the past, and dealt with the nightmare lurking within that responsibility. You’re making a blanket accusation based on ignorance and unfamiliarity with the underlying technologies. Wielded correctly, it’s a cost saver AND a lifesaver.

    I’m a 100% satisfied with Amazon’s service and performance, and it’s getting better every day.

  10. Guys

    We have updated the post to reflect what happened. Clearly it happened late and couldn’t get more information first hand. Anyway AWS outlined how small the problem was. I have made the changes to reflect that.


    Last time the S3 outage was seen as something marginal but became a bigger issue. So I was going with the early reports as per tips I was receiving from Amazon customers. Just to be clear, being alarmist is not my nature nor how I write my posts.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.