Memory optimisations

Yesterday (or in the midnight hours) I finally committed a patch to MediaWiki’s message cache. Betawiki uses MediaWiki in a way that puts a heavy pressure on the message cache. While normal MediaWiki installations have maybe dozens or few hundreds of customisations to MediaWiki interface messages (pages in MediaWiki namespace), Betawiki has hundreds of thousands of messages in hundreds of languages

The amount of messages that needs to be cached effectively is really in a different decade. Normally those messages take maybe few hundreds of kilobytes in PHP’s serialised format, stored in the database or in memory cache. In Betawiki all messages together would take about 23 megabytes! It is clear that loading and handling such a big blob is not going to work, especially when it is needed on every page request and needs to be updated on every change to the messages.

Some time ago we started to hit the memory limit we have set for PHP requests. I made some hacks to the code reduce the burden—but those were only hacks. Before this patch we basically stored only customisations to be used for Betawiki itself and skipping message cache updates totally, so it would only be updated after a timeout.

This was far from an ideal solution. The message cache was caching all the other messages individually. This is of course waste of memory and more importantly fragmentation increased a lot and request per second to memory cache (we use APC in Betawiki) sky-rocketed to thousands per second.

What made me hesitant to commit this patch was, that I needed to update code paths we don’t use in Betawiki, and thus wouldn’t get a much real testing. At the time of writing this message, it seems to be live on the servers of Wikimedia Foundation and is not reverted or got any comments so far, so it probably isn’t totally broken or unacceptable :).

What the new patch actually does, is that it adds a new configuration option, which when set to true will split the cache to smaller caches that contain messages for one language only. This greatly reduces to memory consumption, as only a couple of languages needs to be loaded in normal use. Full localisation of MediaWiki and all supported extensions takes from 500 to 800 kilobytes, depending on the script. The default setting for the new configuration option is false, which should result behaviour identical to the old version. I also added more comments and standardised the names of per language memory cache keys.

This will not solve all memory use problem in Betawiki, but is big step to keep it running efficiently, and with as few hacks as possible. Custom hacks are bad because they add maintenance burden and prevents others from creating a similar setup easily.

Of course the amount of messages will only grow in the future. To tackle this I have planned to move non-MediaWiki related messages to a another namespace, so at message cache will not handle them at all.

-- .