Monthly Archives: November 2012
Performance tuning translatewiki.net
One of the biggest advantage of desktop translation tools is that they don’t have delays rendering the interface – at least not in such a scale as websites have. In translatewiki.net it is crucial that our pages load very fast. In certain places we can and do use intelligent preloading to remove the delays, in other places we have to employ complex caching algorithms to reach that target. I am regularly monitoring the automatically collected profiling information to avoid regressions and to pick low-hanging fruit from time to time.
In the last sprint my main task was to convert the way we handle the translation of MediaWiki extensions in translatewiki.net to use the same processes and interfaces as pretty much everything else. MediaWiki and MediaWiki extensions were the first things supported in translatewiki.net and now they are among the last things to get modernized to take advantage of better interfaces built on the years of experience supporting various kinds of products.
The only user visible change is improved performance. The new interfaces are more efficient and enable more optimizations, which allows us to deliver faster page views and scale to more messages. It will also simplify the work of translatewiki.net staff, as they don’t need to follow two different processes, especially after we update also MediaWiki translation code.
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
As a developer I’m proud that the new code is unit tested. The culmination, however, was a change which removed hundreds of lines of old code: in fact, the above quote applies to software development too.
For those interested in details, the biggest performance boosts were achieved by avoiding the need to parse the translation files in many places – the list of message keys and their values are stored in intermediate cache files in CDB format. In addition there were many smaller performance optimizations, like not using some MediaWiki method to construct a link element, which consumed 20 kilobytes of memory for each link. When there are thousands of links, it adds up and is excessive for just making some hundred bytes of output. I switched it to a more low level method (memory usage: from 175 to 12 MB).
At the time of writing I still have some more fixes pending further testing and cleanup. For example, to access any message group, those all have to be loaded. They are cached as serialized PHP objects, but loading them takes 20 milliseconds and 10 megabytes of memory. I’m working on making it possible to load cached message groups individually.
The website anyone can translate
Translatewiki.net has started using Puppet. Puppet is a tool designed to manage the configuration of servers. Like Wikimedia’s, our configuration is public and stored in the translatewiki.net git repository, where anyone can submit patches. I don’t expect a flood of them coming in anytime soon, my motivations for this were different. If you remember, some months back I had to learn some Puppet to write the Solr configuration for Wikimedia deployment. Now I wanted to learn more and gather more experience on using Puppet. It will also greatly help if we ever need to reinstall the translatewiki.net server from scratch (which is quite likely to happen soon). As a bonus it gives transparency and something I can refer people to when they ask how some particular thing is done in translatewiki.net. As time permits, I will be moving more configuration to Puppet.
Mitä isot edellä, sitä pienet perässä. (Internet suggest the closest translation is Monkey see, monkey do.)
I also added the translatewiki.net repository to Ohloh. If you use translatewiki.net as localisation platform, feel free to add it to your stacks by clicking “I use this”, or to embed its widgets in your website. Ohloh also gives some cool stats:
In a Nutshell, translatewiki.net…
- …has had 739 commits made by 20 contributors representing 3,288 lines of code
- …has a young, but established codebase maintained by a large development team with stable year-over-year commits
Together with the introduction of Puppet, I also switched the webserver of translatewiki.net from lighttpd to nginx. The biggest reason for this is that https was broken for Google Chrome users, but in general nginx feels faster and more robust and the way PHP is used with it is much simpler (php-fpm instead of spawn-cgi). The Wikimedia operations team is supposedly going to test nginx soon, so we will see whether the tide also goes that way.