How I debug performance issues in MediaWiki

The earlier post does not describe how I usually do performance improvements. Usually it starts with debugging the less innocent-looking messages by our IRC bot rakkaus, which relays PHP error messages to the IRC channel. An example:

[01-Nov-2012 20:16:25 UTC] PHP Fatal error: Maximum execution time of 30 seconds exceeded in /www/translatewiki.net/w/extensions/Translate/ttmserver/TTMServer.php on line 100

After this I have to use the timestamp to match our webserver access log and try if I can reproduce the issue by loading the same url. PHP is very unhelpful in this regard: fatal errors don’t give the request url nor stacktrace. Sometimes it is a command line script like the job runner initiated via cron. For those cases I’ve implemented a simple logging of all maintenance script executions, but they are still annoying to debug. Once I am able to reproduce the issue on the production environment, I try to reproduce it also on my development environment. Oh boy, it is fun if that is not possible. If I can, however, I will usually start by looking the per-request profiling included in the page source, with output like this:

0.0558 8.5M Connected to database 0 at localhost 0.0562 8.5M Query sandwiki (14) (slave): SELECT /* SqlBagOStuff::getMulti Nike */ keyname,value,exptime FROM `bw_objectcache` WHERE keyname = ‘sw:messages:fi’

Here we see that it takes 56 milliseconds before MediaWiki even connects to the database, and the first thing it does is to load messages for the current user language. What usually follows is old style debugging where I add echo and var_dump statements until I have understood what is happening and what is inefficient. After that, the creative phase begins: finding a way to make it faster. Usually there is some sort of bug in the code that causes it to do unnecessary work. Rarely the bad performance is actually caused by slow algorithms. This kind of makes sense: the datasets we are processing are usually small, and when they are bigger, it is usually written in an efficient way in the first place.

I love performance tuning, but I have to be prudent to pick the right things to optimize, because it is also a great time sink, and as a busy person I am entitled only to few time sinks at a time.

-- .