February 15, 2008

Amazon S3 Storage Service Goes Down, Still Not Up

Amazon’s S3 cloud-based storage service went down earlier this morning, according to numerous tips we’ve received. The service has impacted many companies, including folks like Twitter. According to our tipsters, the service went down around 4:30 a.m., and is showing a 500 Internal Server Error message.

Amazon Web Services forums are full of people chatting about the outage. One poster on the forum summed up the situation nicely, saying, “The s3 service is great but this just proves you can’t rely on it, this is a major issue especially since it’s been down for so long. Way to go Amazon.”

This outage, one of the first large-scale problems to hit Amazon, shows that a lot of work needs to be done before we can completely rely on the cloud. As I have often said, we are running the 21st century web on infrastructure that was dreamed up in the 1990s, long before the web’s current scale. Still, that doesn’t take away my long-standing enthusiasm for Amazon’s web services strategy.

We will keep you posted. Meanwhile, let us know how you have been impacted and what you are doing to build the redundancy of your web service.

Nick Carr has his take on the situation. “Given that entire businesses run on S3 and related services, Amazon has a particularly heavy responsibility not only to fix the problem quickly but to explain it fully,” he writes. I agree with him, and hopefully Amazon will do the needful. Amazon says it is fixed it, but there seem to continuing problems with the service, as the forum indicatess.

47 comments

47 thoughts on this post

Vikas says:

February 15, 2008 at 8:03 am

Its back up now. We get most of our traffic from India and unfortunately for us, this happened during near peak hours – 6 in the evening. We use AWS for images, but the system defaults to our internal server when it fails. We had been thinking of doing away with the fail-over given how well AWS worked, but ofcourse, that wouldn’t happen anytime soon now

Reply
Adnan says:

February 15, 2008 at 8:04 am

Some one check if Rackspace went down today or not. It appears that “downtime trouble” follows Twitter where ever they go!

Reply
Om Malik says:

February 15, 2008 at 8:29 am

@Adnan,

That is funny. I am betting that TWitter people will not admit their own shortcomings and how badly their system is architected. It is always the hosting company which is to blame.

Reply
Janusz says:

February 15, 2008 at 8:45 am

digg here – http://digg.com/hardware/Amazon_S3_World_s_most_reliable_web_service_is_DOWN

Reply
Pingback: Web Worker Daily » Archive Amazon S3 Is Down, How It Impacting You? «
Pingback: What happens when the cloud is down? - - mathewingram.com/work
Pingback: The Cloud Ate My Homework | John Paczkowski | Digital Daily | AllThingsD
smoothspan says:

February 15, 2008 at 9:54 am

We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.

More on my blog:

http://smoothspan.wordpress.com/2008/02/15/google-reports-iphone-usage-50x-other-handsets-amazon-s3-goes-down-low-friction-has-a-cost/

Best,

BW

Reply
manish jain says:

February 15, 2008 at 10:00 am

“…Amazon will do the needful.”

Om, you did not just use that word…needful.

Reply
Khürt Williams says:

February 15, 2008 at 10:28 am

I use JungleDisk to backup my iPhoto library to Amazon S3 nightly. No data was lost ( on my end ) but I did notice that JungleDisk had to backup the entire iPhoto library and not just the new files.

Reply
Pingback: Amazon S3 Service Down!! - Technozzle
Shane Schick says:

February 15, 2008 at 10:59 am

I’m not happy this outage happened, but we may be better off for it as an industry. There’s so much hype about the possibilities of the cloud right now that we’re overlooking some of the service-level requirements that it may or may not meet. Amazon could inadvertently become a test case that will be studied by other enterprises who are considering moving their infrastructure over.

Reply
Matt says:

February 15, 2008 at 11:01 am

One of our clients sites was down for a while, due to this outage. Seems to be back up. They did say that other than this, the service has been great. We are working on an upcoming project and are pretty sure we are going to use AWS…Definitely going to do more diligence on this and see what the explanation is for it. I look forward to seeing the reason.

Matt

Reply
Alan Wilensky says:

February 15, 2008 at 11:38 am

We are only one major outage away from certain marquee clients swearing off sole reliance on SAAS. This happened to a mid-sized automotive auction, a client, that had with my help knit together a network of dealers, contractors, and agents, into a system with a zero install, zero hosting footprint.

UNTIL:

There were four accounts that were mashed up…the usual suspects, and one of them went dark. We did some pinging (here is a good business idea for a bright Web20 person, third party app monitoring and governance) and isolated the guilty party.

In spite of being punked, fingered, whatever, the slacker who ran the service were very rude and unforthcoming. That’s another problem: who are you going to deal with when these hosted services go down? I’m not so sure if it was SalesForce that crapped out, that it would have been better.

Long and short of it: we have a business community that is used to local control, we consultants want to deliver apps as a service – we will need to ally ourselves with the providers of these services to come up with a game plan…but try and get one of the stars to cough up a retainer!

Most of the startup SaaS guys laugh when I propose a contract to consult on packaging and policies for reliability for the SMB end users.

But this is exactly what they should want, guys like me who bea the bushes for them.

Reply
Roger Jennings says:

February 15, 2008 at 11:39 am

Amazon’s SLA for S3 is 99.9% uptime during a billing month. That’s 0.723 hours of allowable downtime.

See the “Justin Etheredge Offers Preview of LINQ to [Amazon] SimpleDB” topic of http://oakleafblog.blogspot.com/2008/02/linq-and-entity-framework-posts-for_11.html.

–rj

Reply
Nick says:

February 15, 2008 at 11:39 am

Cloud based storage is getting alot of heat today, and since its web centric any amount of downtime is unacceptable. The situation today should not put cloud storage in a bad state, other companies such as Nirvanix have storage nodes around the world with no single point of failure with helps in avoiding situations like today.If your relying on a single point for critical data you’ve got a major problem.

Reply
Alan Wilensky says:

February 15, 2008 at 12:20 pm

I also advised the auto auction that they should invest in the VSAT data services that only charge for rent of the equipment, and any fail-over data transmission, but they balked at the cost.

I told them no matter how reliable (and generally, hosted services are more reliable than a mid-sized businesses owned plant)one local loop for data was no way to run a business. They ran their auction, live, cashier functions and all, on SAAS.

Eventually, their link did go down, and it had nothing to do with the SAAS providers. Now, they have bonded SDSL from two carriers that can split when one goes down.

So many ways to fail.

Reply
Pingback: Amazon S3 Down for > 2 Hours « Kevin Burton’s NEW FeedBlog
Pingback: Outsourcing infrasturucture - the S3 flipside… at diversity.net.nz
benkepes says:

February 15, 2008 at 3:40 pm

Sure – it’s a bummer when a cloud based storage system fails. In the same way that it’s awful when the power goes out. But claims that this sort of outage will harm the ascendancy of cloud computing are akin to claims that power cuts make more likely a return to gaslights and steam powered manufacturing.

Reply
Alexander Sicular says:

February 15, 2008 at 4:31 pm

Wow. Just wow. Everyone out there jumping up and down just needs to relax. Go outside, call your mother, step away from the computer, go to the gym, read a book (and not on kindle). I’m prompted to write this in light of the recent Blackberry outage. Again, a few hour outage gets coverage all over the web and on tv as well. I couldn’t believe the bb outage was covered in depth on cnbc.

Frankly we’re all lucky this stuff even works at all. Go hug your kids or the person to your left.

Reply
Scott from popularo says:

February 15, 2008 at 4:39 pm

In our early beta version, we are using some of Amazon’s web services (namely S3 and SimpleDB) – but have been considering using our own storage and database instead. With today’s outage I’m not sure if AWS is a great strategy for us.

We don’t have huge amounts of data to store like some companies (smugmug comes to mind), so using AWS was mostly for the peace of mind that we would be able to scale quickly after our beta goes public and all of Digg’s users abandon them for us . We have a meeting tomorrow to take a closer look at our strategy for handling lots of new traffic in a short period of time, and I have to say that it doesn’t seem likely that Amazon will be included in the party.

Reply
Pingback: Whoa. at snagg dev blog
Pingback: Interesting reaction to Amazon S3 failure today around the web « Accidental Technologist
Pingback: Cloud Computingt goes up in smoke as Amazon S3 Cluster goes down « The Analytics Guru
s3box says:

February 16, 2008 at 3:40 am

Despite the few hours downtime, it’s still one of the best available and reliable web services, to date…

Reply
Hens Zimmerman says:

February 16, 2008 at 4:11 am

Hmmm, I have back episodes of my podcast stored on S3, so this is a disservice to potential new subscribers. I hope Amazon fixes this soon!

http://soundsgoodpodcast.com

Reply
A.T. says:

February 16, 2008 at 9:41 am

If you ever made an effort to read slides from SmugMug’s chief Don MacAskill (he removed PDF from site, so you can only get it from web.archive.org here http://web.archive.org/web/20070406174427/http://blogs.smugmug.com/don/files/ETech-SmugMug-Amazon-2007.pdf or same link shorter http://tinyurl.com/33t27f ), it starts from nice photo in Amazon data center after major fire. And then goes further here and there, stating that author’s company does NOT count on Amazon’s 100% reliability and does NOT advice to do same to others.

Reply
Pingback: Desastres acontecem - Fudeblog by Cesar Cardoso
Don MacAskill says:

February 16, 2008 at 7:13 pm
@A.T.

Actually, if you bothered to read my slides, let alone my blog posts and other coverage, you’d know that:
- That fire wasn’t from Amazon at all and isn’t related.
- I trust Amazon with 100% of our data. More than 90% of our data lives at Amazon and no-where else.
What I did say is that no service, hardware, or software we’ve ever used is 100% and that Amazon is no different. Depend on it, fine. I do. But expect miracles? That’s just stupid.

Sorry about the slides being missing, that was an accident. They’ve been restored.
Reply
العاب says:

February 17, 2008 at 10:08 am

We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.

Reply
Jeremiah Staes says:

February 17, 2008 at 8:18 pm

It affected me a little bit – one of my subcontractors relies on AWS for file hosting, and so it was a temporary problem for me.

That said, everything goes down. It is incumbent on you to not rely on one service, period. You wouldn’t rely on one spindle of a hard drive; you’d backup. Having multiple options is not only prudent but required, especially when using third-parties as everything will fail at some point and nothing, nothing is going to be 100% uptime, even internal systems you own completely yourself. That’s a very false sense of security.

I like AWS and still would recommend it. Now, if this becomes a habit, then, maybe that might change.

Reply
Pingback: Web Services Economics : The Thomas Howe Company
Pingback: C. Janine Hodge » Blog Archive » Amazon S3 Outage: A Gentle Reminder
Pingback: How Cloud & Utility Computing Are Different - GigaOM
Pingback: How Cloud & Utility Computing Are Different - GigaOM
Pingback: How Cloud & Utility Computing Are Different - GigaOM
Pingback: Amazon’s Services Edge Towards the Enterprise - GigaOM
Pingback: Trackvia - Online Database Blog » Blog Archive » Outsourcing Outages?
Pingback: Amazon S3 goes down for the second time « Just another day in Paradies
Pingback: אלעד בבלוגלי » ארכיון הבלוג » Amazon S3 למטה והשמחה גדולה
Pingback: S3 Outage Highlights Fragility of Web Services - GigaOM
Pingback: Dare Obasanjo aka Carnage4Life - Some Thoughts on Amazon S3's Recent Outage
Pingback: dips.aluscio.us | S3 Outage Highlights Fragility of Web Services [GigaOM]
Pingback: Long live the DBA | Brent Ozar - SQL Server DBA
Pingback: Should cloud computers worry about Salesforce.com’s recent outage? by Thomas Keeley
Pingback: Amazon’s EC2 Service Suffers Outage

Amazon S3 Storage Service Goes Down, Still Not Up

Leave a Reply Cancel reply

47 thoughts on this post

Share on Mastodon