39 thoughts on “RackSpace Outage Hits Home”

  1. Om, we let you and many others down tonight. Bad luck or not, we failed to deliver what we promise. We also learned a lot about needing to communicate more in real time with customers. We are determined to earn back the trust lost tonight. We hope our customers, including you, give us that chance.

    lew

  2. Data Centers are expected to have redundant power sources and backup devices. A truck should not be able to knock off a data center – otherwise it is not designed or planned well enough!

  3. Well, our team was engaging key investors this Sunday/Monday and was also in the middle of our biggest outreach program since launch. And Rackspace let us down on Sunday morning for four hours (no server, email, nothing…emails bounced back! We basically didn’t exist!). And we were not even notified when it happened!

    And then, after 24 hours of me (CEO) explaining the situation to countless people…and assuring them that it was a rare one-off circumstance that would never happen again…IT HAPPENS AGAIN. Our server is still down right now.

    In all seriousness, this could destroy a business. Rackspace’s whole “zero downtime” guarantee has actually been almost 10 hours of downtime in the past 48 hours (not to mention GREAT costs to the credibility and revenues of many businesses out there including my team).

    What corners have they cut with back-up systems, generators, etc!? Truly destructive .

  4. well, get redundancy in data centers. The problem is that redundancy is non trivial to implement on both the software side and on the interconnection side and will cost. How much is your business worth to you? If you can’t do it properly or costs too much to do it yourself, host the site on people that have implemented redundancy for you – google or amazon web services et al. As for the backup power supplies, if you don’t test, chances are the backup isn’t as redundant as you thought it was – batteries die, breakers don’t break, switches fail.

  5. Rackspace has always been and is a and extremely hyped service. Scratch the surface at Rackspace and there is no quality. If you have 2 servers then Rackspace is ok, else they are their so called support is not worth it. And now this amazing failure!

  6. I’ve worked on and off with Rackspace for almost 7 years and true to their claim I’ve never faced serious downtime issues. They also have sat patiently while addressing problems during server migrations, etc. with my IT staff.

    Personally I couldn’t imagine the embarrassment suffered from a CEO attempting to showcase their online business to investors only to find their server’s gone MIA. However, I also know that Rackspace’s 100% uptime guarantee comes with a solid SLA. One that in times like last week they will make due on.

    I wouldn’t let years of trustworthy service erode so quickly.

  7. Rackspace has showed that they are a marketing gimmick on steroids with these outages. A single truck hitting a power pole taking out their data center shows their lack of redundancy planning.

  8. You can’t possibly expect 100% uptime for a single location, regardless of the redundancy built into that infrastructure.

    This is why multi-site architectures (failover or active-active) are used by every for whom downtime really matters. And it is also why the 100% guarantee for Rackspace is only 100% guaranteed to ensure you will have SLA refunds.

  9. I was also down for 3 hours last night.

    They shut down our servers due to the heat at the datacenter. From what we understand:
    In the second incident at approximately 6:30 PM CST Monday, a vehicle struck and brought down the transformer feeding power to the DFW data center. It immediately disrupted power to the entire data center and our emergency generators kicked in and operated as intended. When we transferred power to our secondary utility power system, the data center’s chilling units were cycled back up. At this time, however, the utility provider shut down power in order to allow emergency rescue teams safe access to the accident victim. This repeated cycling of the chillers resulted in increasing temperatures within the data center. As a precautionary measure we decided to take some customers’ servers offline. These servers are now back up, as are the chillers.

    So it seems as the redudant systems worked. With power and all, but the chillers failed when they had to cylce them multiple times because of the accident victim.

    Although all of our servers and our imaged suffered, I can’t say enough good things about rackspace and what they’ve done for us. I mean, with all my experiences with datacenters (esp The Planet) they handled everything as best as I can ask for. They’ve gone above and beyond with any support request me and my team have had and they are simply… Fanatitcal as much as I can expect them to be.

  10. We have hosted with Rackspace for some years now, and in my experience they have been growing wayy too fast, so the experienced, high quality administrators from 2 years back are just not accessible anymore. Instead the administrators that are supporting you have very superficial knowledge of the systems they are supposed to manage. In times of trouble, the B-team (as we call them) are not very reliable and in some cases they just panic.

  11. Amazing… we had a site launch yesterday, it went down just an hour after it was launched. (With a really happy client seeing it going down)

    We do have two dedicated servers in there, the funny thing is that both servers ended up having fried hard drives, and Rackspace performed restore in one of those with a faulty backup file… I mean, it could safer to launch from my computer at home!!!!

  12. Can I just ask, as a human being, anyone know if the driver is OK? I see all this stress over downtime – but a man is involved in an explosion and I haven’t found one report as to his state of health!

    Unless I’m missing something huge here, it makes me sad people now care more about virtual products than physical people.

  13. Rackspace’s “zero downtime” is a lie, as is their fanatical support that is outright terrible. My neighboer told me to call ntt/verio. I have already taken my services over to them. My advice is to call them – NTT/Verio at 866-341-7867 and ask for Bruno.

  14. I have to honestly say that web site hosting (rackspace, et. al.) hardly counts as Internet infrastructure IMHO … That said, the leaves (edges) of services are lined with single points of failure (services that are not redundant). But none of those services could reasonbly count as infrastructure IMHO …..

  15. I see all this negative hype that RackSpace is getting for a power outtage caused by a guy (supposedly) having a heart attack at the time of the crash.

    I start thinking “How different is this incident than your household electricity shutting off when lightning storm is in your neighborhood?” Sure, you get upset and immediately call the electric company because the outtage disruped your favorite TV show and you’ve been waiting for this episode for over a month.

    What are you going to do now, cancel service tomorrow and hook up with another electric service? Will that guarantee perfect service in a perfect world? Grow up!! Sh*t happens.

    When you get a flat tire, do you blame the highway department for letting debris get on the roads or do you jump right in and sue the tire manufacturer? Will that fix the flat? Give me a break.

  16. Its like a butterfly flapping its wings and causing a tornado, only in the online world we can trace it to the event that started the ripple.

  17. I’ve been with Rackspace nearly 6 years and this is a first for me. Even still, I only lost one server (my other is with them in San Antonio) for only about an hour last night. Rackspace was responsive and things were back online reasonably quickly.

  18. The outage occurred in Dallas, not San Antonio (it’s the second sentence in the article.) Rackspace is based out of San Antonio and has data centers all over the place.

  19. Pingback: Web-Tones
  20. Suddenly Leigh Anne wants his 99.999% advertisement removed 🙂
    Anybody who has worked in a datacenter knows that there is only so much you can get redundant without the cost rising.

  21. Matt: When I spoke to my account manager at Rackspace, I asked that very question. From what I understand, the driver is doing fine.

  22. Did anybody actually read Rackspace’s comments on what happend? The truck did not cause the outage. It was the power company.

    http://www.rackspace.com/information/announcements/datacenter.php

    6:30 PM CST Monday, a vehicle struck and brought down the transformer feeding power to the DFW data center. It immediately disrupted power to the entire data center and our emergency generators kicked in and operated as intended. When we transferred power to our secondary utility power system, the data center’s chilling units were cycled back up. At this time, however, the utility provider shut down power in order to allow emergency rescue teams safe access to the accident victim.

  23. <>

    Dear Matt/Jim/Bruno,

    I’m truly impressed in the trust of your neighbor’s recommendation and your ability to negotiate a contract and migrate your services over to Verio in less than 24 hours.

    Next time you try to slam a competitor be sure you don’t leave your name and company URL in your signature file. Your post is full of lies (and spelling errors to boot).

    Nice try.

  24. We have hosted with Rackspace for 3 years and they have been fantastic. This outage really hit us hard though. It nuked the boot drive on a RAID array of our database server and our 200 customers were offline for 21 hours. We worked all night and all day to restore the DB environment. It has definitely shaken the trust of some of our customers. This could put a fragile company out of business.

  25. I need to add that placing all your trust in one place is dangerous. No one can predict or defend against every scenario. Rackspace is head and shoulders above any other hosting company I have used but that should never replace thorough and detailed disaster planning and testing. I am using this situation as an incentive to do just that.

  26. I was a big fan of Rackspace always telling anyone who were happy to listen, even to those not willing to ;-))

    But November was a first loss of 150% trust, then this week this is the end of it. We had our server potentially compromised, we then decided with the advice of Rackspace engineer to rebuild the server.

    Now it took far too long over a day, then we realise that we didn’t have backup after the 16th of January, so 5 days with no backup, and yes we do have Managed Backup with Rackspace.

    So now I am with a server which is partially restored, emails are back online but we have lost 7 days of them which is significant.
    And on top of that we have lost one very precious directory whereas the data was a reference and no other backup or copy because it was confidential and was supposively backedup.

    My question to rackspace is how come a Managed Backup remain un-noticed for 5 days. I have told them this of course. How come there is no alert defined if the volume backed-up is suddenly less than 50% of the normal volume ?

    All this to say that I am looking actively at the moment in finding another host for a dedicated server as my level of trust reach the bottom.

    I have actively defended rackspace at our board of director, but this time, I can’t see what excuse I can find for this.

    Sorry guys at Rackspace, but not good enough ;-(

    Pascal

    PS : I am not working for any competitor of Rackspace and I do not have any friend / family / acquaintance with any competitors or related companies to a competitor. I say that in case someone think that it might be the case.

  27. Oh, I forgot to mention that although I have a 2 weeks retention managed backup, it seems that 2 weeks in rackspace time dimension is only 10 days as they cannot find any backup prior to the 14th, considering that yesterday we were the 23, I guess 14 days back would mean what ? 9th of January.
    So another mystery and another reason for me to worry and get few more grey hairs which suit me, but still …

    I will keep you posted about progress

    Pascal

  28. Seems that rackspace made a mistake and it was a costly one. If they have been a good company for so long though I think closing the book on them may not necessarily be the best solution. Mistakes happen I guess is all I am saying. Thanks for posting this article!

  29. We have have several issues with rackspace including this outage. It was pretty much the last straw for us. We moved to Server Intellect and never looked back. I can and will always unstand outages that are BEYOND the control of the host, but when your paying actively for backup services only to find out when disaster strike you have no backups that is a problem.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.