Facebook engineers have come up with a way to turbo-charge PHP, a programming language preferred by web developers. And the Palo Alto, Calif.-based social networking and identity provider is now open sourcing this technology called HipHop for PHP, a source code transformer which programmatically transforms PHP into highly optimized C++ and then uses g++ to compile it. HipHop for PHP was developed internally to boost the performance of Facebook applications while also lowering hardware costs. From the Facebook blog:
With HipHop we’ve been able to reduce the CPU usage on our web servers on average by about fifty percent depending on the page. Less CPU means fewer servers. This project is incredible, has had a tremendous impact on Facebook and we are releasing it as open source in hope that it brings a new focus toward scaling large complex websites with PHP.
HipHop for PHP is the latest in a long list of products that were developed internally by Facebook and have been open sourced to the world. PHP -– aka Hypertext Preprocessor -– is a scripting language much like Perl, Python and Ruby that was created by Rasmus Ledorf in 1995. In terms of CPU and memory demands, scripting languages are less efficient than compiled languages such as C++. PHP is currently used by popular and large, dynamic web sites such as Facebook, WordPress.com and Digg. It was viewed as one of the hottest new technologies in 2005, when venture capitalist Marc Andreessen, founder of Netscape and current Facebook board member, told the Wall Street Journal that “PHP is to 2005 what Java was to 1995.”
Of course, since then the web has scaled many times and as a result some severe shortcomings in PHP have been exposed. Many folks have subsequently looked for alternatives, opting for languages that have cleaner syntax, a more vibrant community and perhaps even better frameworks. Well-known PHP programmer Terry Chay in a blog post defending PHP recently noted:
Obviously, I think PHP is very frequently the right choice. The reason I choose PHP is that it is a web-based templating language that is simple, scalable, and pragmatic. Choices have consequences. Everyone knows what consequences are. If not, there’d be a One Language to rule them all. And, we’re not Java developers. 😉 One consequence of PHP is that it is now stuck between a rock and a hard place.
On one end, the ubiquity of rich, Ajax-driven, web sites means the inherent advantage of a templating language is no longer there, having been replaced by a much larger demand for design….On the other end, social networks have sped up demands of data, so that they live more in RAM in the form of memcached than on disk in the form of a relational database. When making a web page was tied to a disk-bound database, performance discussions are pushed to database performance discussions, which really is a discussion of disk performance. …web performance is now a complicated beast.
Facebook, which bet on PHP early on, has had that quandary for a long time. It had no option but to innovate around PHP and its resource hog-like habits. The reason PHP can be so resource-intensive is because it’s interpreted on the fly.
You can make it perform better by using tricks such as output caching, recycling the compiled code that’s generated at runtime (aka opcode caching) and writing extensions that are essentially bits compiled as C++. However, all these techniques have their own set of issues. Given than I am no programmer, here is my understanding of what Facebook has done: It came up with a way to analyze the PHP code and convert it to optimized C++ code, which is in turn compiled to a very speedy machine-specific code. HipHop benefits from the maturity of g++, a C++ compiler.
Scaling Facebook is particularly challenging because almost every page view is a logged-in user with a customized experience. When you view your home page we need to look up all of your friends, query their most relevant updates (from a custom service we’ve built called Multifeed), filter the results based on your privacy settings, then fill out the stories with comments, photos, likes, and all the rich data that people love about Facebook. All of this in just under a second. HipHop allows us to correct the logic that does the final page assembly in PHP and iterate it quickly while relying on custom back-end services in C++, Erlang, Java, or Python to service the News Feed, search, Chat, and other core parts of the site.
Earlier this week I spoke with Haiping Zhao, a Facebook senior software engineer; Scott MacVicar, the company’s open-source developer advocate; and David Recordon, a senior programs manager, who explained to me that the company first started working on solving the scaling problems in 2007 and tried to fix it using various methods before coming up with the current solution.
Recordon said that the company wasn’t done optimizing but wanted to open source its code mostly because it wants other people to use it and also help extend it. He was confident that there will be many takers for HipHop for PHP, especially among the enterprises looking to save on their hardware spending. “When you can slash your hardware costs by half, that is significant,” said Recordon.
In addition, Facebook’s engineers believe that performance gains should help PHP re-attract developers who might have opted for the more fashionable programming languages such as Ruby and Python. Any switchers would help solve Facebook’s more pressing problem: a desperate need for more and more developers in order to keep growing its web empire.
Related GigaOM Pro Content:
On-demand compilation of Web scripts is actually something that Microsoft pioneered with ASP.NET. Unfortunately it probably won’t resolve the most common types of Web performance problems — this is the kind of tactic that makes sense for those dozen or so Web companies that are operating at Facebook scale. For the vast majority of sites, adding in-memory cache is the logical next step for perf/scalability.
Yea, this sort of stuff is really for the rare extremely popular application. A lot of people worry about scaling unnecessarily. But this is definitely an amazing piece of open source software that I will use on one my sites that happens to be on its way to being active enough to need such things.
I see this as something that will be quickly adopted and supported by essentially all the major PHP apps. How many times have you gone out to a site that was hosting WordPress or some such and performance just crawled because it was on a shared server? This is going to address the CPU-intensive side of the shared-server pain point. Unfortunately, it won’t help make your pipes seem bigger…. but you ARE using HTTP compression, aren’t you?
Has anyone besides Facebook used this yet?
Transforming PHP to C++ to make it half decent is a bad joke and waste of time. They wanted speed and scalability? Use a real programming (strongly typed) language instead?
Thanks for the very interesting article. HipHop seems really interesting, we have had to switch to lighttpd and xcache to achieve huge performance benefits for our objectCMS framework which is now faster that wordpress, joomla and drupal. I am wondering how hiphop compares to lighttpd and xCache, does anyone know when hiphop is going to be released to the public.
I’ve been doing this for almost 2 years already (taking parts of an app and just writing them in c++ and either including them as an extension of just doing a command line call). Got the idea when crazy client, who makes tons of money, didn’t want to get more servers. Now with Gearman ( gearman.org ) there are even more possibilities with this approach.
This is the best post on a non-developer blog i’ve read about HipHop.
PHP are never ended recourses based on open source, i think it still the best web programming in the next couple years
With #HipHop PHP reduces the servers CPU usage by about fifty percent depending on the page Facebook becomes faster! http://t.co/S8ZZbPaN