How did Wikipedia complete its transition to HipHop Virtual Machine, Facebook’s open-source PHP runtime? Brett Simmers from the HHVM team and Wikimedia Foundation principal software engineer Ori Livneh detailed the process in respective blog posts.
Simmers wrote on the HHVM blog:
I spent four weeks in July and August 2014 working at the Wikimedia Foundation office in San Francisco to help them out with some final migration issues. While the primary goal was to assist in their switch to HHVM, it was also a great opportunity to experience HHVM as our open-source contributors see it. I tried to do most of my work on WMF servers, using HHVM from GitHub rather than our internal repository. In addition to the work I did on HHVM itself, I also gave a talk about what the switch to HHVM means for Wikimedia developers.
Most of my time at Wikimedia was spent working on some fixes for HHVM’s DOMDocument support. MediaWiki, the software powering Wikipedia, allows users to import and export articles as XML files. To do this, it uses a custom PHP stream wrapper, which our implementation of DOMDocument didn’t properly support. We had a number of outstanding pull requests in this area, including a large one from WMF developer Tim Starling. The main problem blocking its merge was a potential security issue: The default PHP settings allow XML external entity injection attacks, which we didn’t want to allow in HHVM. I combined the pending pull requests, cleaned them up a little and implemented a protocol whitelist for external entities, keeping HHVM secure by default with the option to open things up if necessary. This is a good example of our general philosophy on PHP compatibility: We try to match PHP behavior exactly, unless doing so causes significant security issues or other problems.
Having free access to all of the knowledge contained in Wikipedia is an incredible resource for hundreds of millions of people around the world. Everyone on the HHVM team is excited to see our hard work helping out such an important project, and we’re optimistic about seeing even more performance gains in the future. The success of this transition is also a great indicator of the growing strength of the open-source community surrounding HHVM; Wikimedia developers Tim Starling, Ori Livneh, Erik Bernhardson and Chad Horohoe all contributed a lot of work back to HHVM during the migration process.
And Livneh added on the Wikimedia blog:
The creator of the wiki, Ward Cunningham, wanted to make it fast and easy to edit Web pages. Cunningham named his software after a Hawaiian word for “quick.” That’s why the Wikimedia Foundation is happy to report that editing Wikipedia is now twice as quick. Over the last six months, we deployed a new technology that speeds up MediaWiki, Wikipedia’s underlying PHP-based code. HipHop Virtual Machine, or HHVM, reduces the median page-saving time for editors from about 7.5 seconds to 2.5 seconds, and the mean page-saving time from about 6 to 3 seconds.
Nearly half a billion people access Wikipedia and the other Wikimedia sites every month. Yet the Wikimedia Foundation has only a fraction of the budget and staff of other top websites. So how does it serve users at a reasonable speed? Since the early years of Wikipedia, the answer has been aggressive caching — placing static copies of Wikipedia pages on more than 100 different servers, each capable of serving thousands of readers per second, often located in a data center closer to the user.
Following extensive preparations, all of our application servers are now running MediaWiki on HHVM instead of Zend. The performance gains we have observed during this gradual conversion in November and December 2014 include the following:
- The CPU (central processing unit) load on our app servers has dropped drastically, from about 50 percent to 10 percent. Technical operations team member Giuseppe Lavagetto reports that we have already been able to slash our planned purchases for new MediaWiki application servers substantially, compared to what would have been necessary without HHVM.
- The mean page save time has been reduced from ~6s to ~3s (this is the time from the user hitting “submit” or “preview” until the server has completed processing the edit and begins to send the updated page contents to the user). Given that Wikimedia projects saw more than 100 million edits in 2014, this means saving a decade’s worth of latency every year from now on.
- The median page save time fell from ~7.5s to ~2.5s.
- The average page load time for logged-in users (i.e. the time it takes to generate a Wikipedia article for viewing) dropped from about 1.3s to 0.9s.
The HHVM team has already included some MediaWiki parsing tasks into its performance regression tests, meaning that upstream changes that would deteriorate MediaWiki’s performance can be detected and prevented even before a new HHVM version is released and deployed by us. This was not the case in the old PHP framework we have been using before.
More generally, we are confident in having found a strong, capable upstream partner in Facebook, which is sensitive to our needs as reusers, and is committed to maintaining and improving HHVM for years to come. In turn, we started to contribute back upstream. The most substantial contribution made by a Wikimedia developer is the major rewrite of the Zend Compatibility Layer, which allows Zend PHP extensions to be used with HHVM. These improvements by WMF Lead Platform Architect Tim Starling have been highlighted in the release notes for HHVM 3.1.0.
With HHVM, we have the capacity to serve dynamic content to a far greater subset of visitors than before. In the big picture, this may allow us to dissolve the invisible distinction between passive and active Wikipedia users further, enabling better implementation of customization features like a list of favorite articles or various “microcontribution” features that draw formerly passive readers into participating.
Readers: How often do you use Wikipedia?