16 thoughts on “How Facebook Squeezes More From Its Machines”

  1. “Facebook has more than 30,000 servers, according to some estimates, to which it adds roughly 10,000 new ones every month.”

    Are the above numbers correct? Are we saying that FB came this far on 30K servers, and now it is adding 10K servers every month? Put another way, does that mean, FB is adding one-third as many users (or user data processing) as it has right now to its system every month?

  2. “This performance boost initiated an investigation that found the web application to be memory- and CPU-bound.”

    Wow. They needed an investigation to figure this out? Some people have an intuitive sense of this stuff, and some don’t. Often it’s correlated with age, so I recommend Facebook hire some olds.

    Every time I hear/read the FB guys talking about how they manage their data center, I think about how often I sit waiting for someone’s FB photo to load. (And I seem to remember a GigaOm article about how fancy their photo servers were.)

  3. Obviously FB was doing scalability testing before this tool was created or it wouldn’t know that it typically adds 10,000 new servers every 18 months. FB has plenty of real-world workload performance metrics to draw from, and I’m sure it can afford a server or two for its test environment as part of all those server purchase orders. The company should be able to figure out how changes to the site affect workload performance. So, I’m just not sure I understand how FB is using this tool or why they needed to create it in the first place. What does the company do differently now versus before it had the tool? And what was deficient about existing monitoring and capacity planning tools? Certainly FB didn’t need to create a tool to realize that servers with Intel’s new architecture would perform better than existing servers with an older Intel architecture…or that lower-latency RAM would improve the performance of a RAM-intensive workload…

  4. Good article! Did FB replace all old servers with new ones? That is major investment! Using many cheap servers is the Google model, but there must come a time when running a few supper computers in place of thousands of desktop servers is more efficient. Just image the support and maintenance for so many servers!

    1. There is always a pricing sweet spot for compute capacity, cost ( net of both opex and capex ) at any given point in time. It makes sense to buy servers with this sweet spot configuration as opposed to cheapest servers ( mostly underpowered ) or the latest greatest stuff. And yes it sometimes just makes sense to recycle all your existing machines at the end of 3 years.

  5. I’m not sure why this paper was released – its much more of an internal thing – facebook did an investigation to profile their architecture, and they found some stuff that they probably already predicted; however, they did do it at a very fine granularity, which is interesting, and are proposing that the typical benchmarks may not be great indicators; alas, their paper only seems to mention benchmarks casually, without delving into the benchmark vs. actualized disparity – a little more analysis was necessary. Their final point is a good one – profiling is a good strategy.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.