  1. The mocking tone is not really appropriate in this instance. The DC area suffered a massive storm with 2M without power over five states. 60-80mph winds. Trees down everywhere. I think this is bigger than you imply.

  2. The CTO’s and Sysadmins for all the companies should be fired Monday morning… Lets not blame the weather but instead blame the fact that these companies allowed a “failure point” in one datacenter affect their entire chain of service instead of using “high availability” best practices.

    1. I don’t think there is a company on the planet who has invested more in scalability and reliability in the cloud than NetFlix. If any of those guys were fired I’d hire them right away. It is easy to sit on a couch and throw stones. Netflix, Instagram, Pinterest have very complex distributed architectures and none of us should be judging them. They have a accomplished amazing things with small staffs. My company has survived every AWS outage since we deployed on the cloud in 2009. I think our choice to stay away from RDS is one of the main reasons why. As much as we like the automation of RDS, it seems to go down with every outage. When the database is down, life is not good. We have had various servers go down but our database never has. That is the main reason why we have never missed a transaction. We are not any smarter than the Netflix guys. We have less complexity (due to much less traffic) and we chose to manage the database ourselves because the risks outweighed the automation.

    2. In many cases it may be calculated risk, costs of getting more nines in uptime are easily compared to lost profits, and all these instagrams are nothing we can not live without.

    1. They do have backup power. The entire datacenter did not lose power. Some instances lost power which to me means some of the servers got knocked off line. That is much different than a complete datacenter losing power. Only 2 of our 50+ servers were impacted and we never missed a beat.

