Category Archives: English

Goes to some planets

translatewiki.net celebrates – so do I

Oh boy time flies. Translatewiki.net turns six years next Saturday. This is the first time we celebrate its birthday. How did it happen?

It was 2005, my last year at upper secondary school when I set up a MediaWiki for myself to do some school work. I was 17, and in the fall of the same year I started studying at a university. Can you imagine how awkward it was to attend university under age of majority (18 years in Finland)? Anyway, I think the wiki was originally called Nukawiki, then Betawiki and finally translatewiki.net. The wiki has gone through many updates. It probably started with Mediawiki 1.4 which boasts in release notes that User interface language can be changed by the user. It’s also gone through many computers starting from my laptop and gradually to more powerful, more dedicated servers.

Already before the summer of 2006, when I started my obligatory military service which lasted six months, I was using the wiki to translate MediaWiki into Finnish and fix i18n problems. In 2006 we started inviting other translators to join. In February 2007 I started translating FreeCol into Finnish and soon they moved all translation related activities into our wiki. One of the initial translators was Siebrand, who has had enormous influence on the direction the project has taken since he joined.

In other words translatewiki.net was a small hobby project for an entirely different purpose, then I used it to scratch a personal itch, and nowadays it is a thriving community with thousands of members. We are already huge in many metrics, we are still growing and there doesn’t seem to be any boundaries for our size. I just cannot imagine how many people the work of translatewiki.net has impacted. For me this means an opportunity, but more importantly a challenge. How do we improve our service while scaling up? How can we provide better tools for translators, for ourselves and for projects that use us? We have been successful thus far, because we have been very efficient – it is almost scary how few people (albeit very dedicated) can keep everything running smoothly.

Translatewiki.net has had and still does have huge impact to my life. It is just not because it is a huge time sink for me. It is a manifestation of the many skills I’ve learned during my life. It feels wrong to say that it is my hobby, because sometimes it feels that studying is the hobby here. Nevertheless my master thesis is nearing completion. I already have a job in mind and I can’t say that translatewiki.net didn’t affect that.

I’m sincerely grateful to each and everyone who has helped translatewiki.net to become what it is today.

Translatewiki.net is happy

Many of the issues that have been annoying us all in translatewiki.net have been fixed lately. To show my appreciation on behalf of translatewiki.net I’d like to highlight these fixes.

Issue one: saving messages in talk pages fails. If you just pressed “Save” once, you got an error message about broken session data. The reply was saved only if you clicked “Save” a second time. I don’t know how many messages we lost due to this. I lost a couple because after replying to a thread, I went to do some other things. Many of my replies were delayed, because I didn’t notice immediately that the save failed. What was worse, usually one would have to scroll down the page to even see the error message! I’m very happy that it is fixed now. Many thanks to Andrew Garrett!

Issue two: portions of changes were not shown at all when viewing differences between two versions. Not as annoying as the first item, this was still nasty and confusing us. I submitted a test case for this bug in wikidiff2 extension and fortunately Tim Starling was able to reproduce it. Soon after he committed a fix. Thanks Tim!

Issue three: message groups for projects which store all translations into single file (like Pywikipediabot) were stuck in “has changes” status. This bug only annoyed the project leaders of translatewiki.net. After some encouragement Robert Leverington came up with a fix and found a serious bug in code which determines if there has been any changes into the messages. The fix affects all message groups. To Robert: good catch and big thanks.

Issue four: Microsoft® Translator, one of the translation services we use to suggest translations for our translators next to Google Translate, Apertium and our own tmserver, is often incorrectly identified to be down. Brian Wolff and Sam Reed have helped to investigate the issue, but it is not yet fully fixed.

Finally many thanks to those who help us to keep translatewiki.net running from day to day, you are many. A special thanks goes out to netcup.de – Webhosting, vServer, Servermanagement who has provided us with their flagship product “vCloud 8000”, which allows us to serve our pages faster than ever before. We need lots of help with challenges that range from coding to writing and design. Don’t hesitate to ask us how you could help us!

Translation engines: black boxes

One would hope that using machine translation system would be as easy as giving text and pair of languages in and getting something out. But at least here in translatewiki.net things are pretty complex under the hood.

First of all these translation engines are external systems which are based on huge corpora of translated texts and statistical methods. Translations are queried trough HTTP requests. The Translate extension implements an algorithm which keeps tracks of failures and disables the whole service for some period. Failures can be error messages, time outs or even failures to establish a connection. For example on translatewiki.net recently moved to a new server which has a bit unstable DNS resolution which needs to be fixed.

Disabling serves multiple purposes. First of all if the service is temporarily down, we don’t waste our nor their time trying. Secondly, if we hit some kind of rate limit (we shouldn’t) we can back off for a while.

Then there is a issue with the contents–the engines like to mungle mangle mingle all things they don’t understand. In interface translation with many special characters and expressions this is annoying. I just recently made some improvements here based on a suggestion from Jeroen De Dauw. The most common special syntaxes are now armored against changes. This includes variables like $1, %s or %foo% and some other things. Line breaks disappear too, but that was already worked around earlier.

Chatty bots and minimizing disruptions in continuous integration

Those who use IRC are probably familiar with bots. Esstentially bot is a client which is a not human. This time I’m talking about specific kind of bots, let’s call them reporting bots. Their purpose is to alert the channel about recent happenings in (near) real time. Open source project channel usually at least have a bot that reports every new commit and bug report filed.

Also the translatewiki.net channel #mediawiki-i18n has reporting bots. We have one CIA bot reporting any i18n related commit to any of our supported projects. I have to mention that the ability to have own ruleset for picking and formatting commits is just awesome. There is also another bot, rakkaus (“love” in Finnish).

Its purpose is to report issues with the site. To accomplish this we pipe the output of error_log, which contains PHP warnings, database errors and MediaWiki exceptions, to the bot. It worked mostly fine, except that bot would flood everyone when the log was growing fast. Few days ago it went too far. We had a database error (a deadlock), which was reported by the bot… including the database query… which happened to contain few hundred kilobytes of serialized and compressed data–in other words binary garbage. Guess how happy we are were when we save channel full of that??

Okay, something had to be done. And so I did. I wrote a short PHP script which:

  • Reads new data every 10 second
  • Takes the last line, truncates it to suitable length and forwards only the snippet and notifies how many lines were skipped in the log

And now everything is nice again :) The script is not yet in SVN, but I will commit it later.

By the way, this bot is half of the reason why we might complain to you in few minutes after you committed code which breaks something in MediaWiki. Fortunately MediaWiki has taken steps to prevent committing code which doesn’t even compile, so we can skip some of the useless mistakes caused by carelessness.

Because we care about the users using translatewiki.net, we want to minimize any disruptions. The measures we have taken are:

  • Even though we update code often, we can rollback easily. With small updates it is easy to identify the cause and chances are it is fixed very fast too.
  • I personally am doing code review, trying to spot most issues before they reach us.

The Translate extension for MediaWiki has documentation

The Translate extension for MediaWiki is no longer just a hack for translatewiki.net. Actually it hasn’t been that for a long time anymore, but recently other projects have started using it. That means lots of things, like supporting stable releases of MediaWiki, instead of just development versions.

Today’s topic is documentation. I have been amending our existing documentation with Siebrand. Previously there was only some documentation how to install the Translate extension. Now we have sections for the page translation feature, the configuration of the extension, message group configuration and command line scripts. All these have been collected into our documentation index page along with links to other resources. One of those other resources is code documentation generated with Doyxgen. That should really help anyone who is interested in developing for the Translate extension – yes, we are looking for help!

Naturally documentation is a moving target and it will be improved continuously, like the code itself. While we have documentation for developers and those who want to install and configure the Translate extension, we are still lacking great user documentation in many areas. Even though the saying goes that good software does not need separate documentation, that does not mean we shouldn’t have any. It is important to show everyone what can be done with the Translate extension and to get them either interested or have them use the software (more) efficiently as an end user.

GSoC wrap-up – Translate extension

GSoC is almost over now. Lots of cool things have happened, but unfortunately you may not be aware of it, because I have neglected to blog about it.  That is definitely a regression compared to last year and something to keep in mind in the future. I managed to do almost all tasks from the project plan with priority higher than 4, with some rough edges there and here. Next I will pick some highlights from the completed tasks.

Improved usability

This year there were many usability related issues to improve the translation work flow. The improvements done by Wikimedia Usability project nicely complements my work for the benefit our less technically oriented audience. Most important improvement is probably the buzzword compatible ajax-editing. No more do translators need to open new browser tab for each message they want to edit, but instead they get floating dialog inside the current page (implemented using jQuery dialog). This means they never need to leave the list of messages any more, but it stays always in the background. It also makes easier to do quick edits to message documentation or other languages, because you just get a new dialog, and once you finished editing it, you are back in the previous message.

Ajax edit interface

Ajax edit interface

Other features to include user preference to choose additional languages to show when translating. The feature itself is not new, but now users can customise the list of languages.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

To date we not really taken advantage of achievements in language technology. Now we have taken the first steps towards it by implementing a simple translation memory. It is a very simple setup, where we use tmserver from Translate Toolkit and fill it from time to time with existing translations from translatewiki.net. Tmserver uses well known Levenshtein algorithm to give suggestions. It isn’t very good, nor anything compared to state of the art systems, but the suggestions have already been useful, as told us by the translators itself. There is many ways to improve the suggestions from better algorithms to using larger set of translations as source data and preprocessing the source data (text alignment, case and punctuation normalisation). I’m looking forward to them.

Other changes

There were many improvements to the lesser used features. Special features (magic names, special page alias, namespaces) can now be exported using a script. No more time wasted in copy-pasting. In addition it is now possible to localise magic words for extensions. It is up to translation teams to decide, whether they want to do this understandably controversial thing.

In message checks there were at times false positives which caused confusion among translators. Now there is flexible system to suppress those warnings.

Gettext-style plurals are now supported better, but no one of our Gettext projects is currently using those yet. Related, there is now a special page to import offline translations. We can now give trusted translators or users the permission to import offline translations, delegating them away from server admins. It supports download-from-url, files uploaded to the wiki and local file uploads.

The offline importer actually uses the same engine that I developed for another feature: web based message group management. It is now possible for project admins to import external changes, fuzzy other changes if necessary using their browser. It is much easier than doing those steps manually on the command line, but there is still some practical problems to solve. One major piece still missing is integration with version control systems, so command line access is still needed to do svn up or similar for other systems. It is somewhat related to the other problem, which is limited execution time for web requests. It is currently wise enough to check after every action if we are near the limit, and stop further processing and give the user ability to continue from that point. We can’t increase the execution time limitlessly, but there might be hope for example by doing multiple requests with ajax to spare the user from clicking continue button many times.

The future

There is always something to be done or something that can be improved. I will target on improving the new web interface and group management, which is still quite immature. Ajax-editing works, but is still missing the cool-factor without proper polishing. And like that isn’t enough, Siebrand has collected wish list for me. I will try my best to fulfil each request with my time which is limited especially now that study year starts again.

It will be interesting to see where we are next year. We are not alone any more and while other platforms are developing I want to keep translatewiki.net special – to give a face to internationalisation and localisation instead of being just a dumping ground for translations.

GSoC status report – Translate extension

Last year I participated in Summer code Finland. During that I added many new features to Translate extension, to allow biggest user of the extension, translatewiki.net grow bigger. And now translatewiki.net is indeed bigger. This year the project plan contains many tasks, which aim to make the using experience more pleasant both the translators and the project admins. In addition there is a pile of bug fixes and i18n improvements to MediaWiki. I will tell more about those features when I finish them.

The first coding week is now in the past. The big task for that week turned out to be more difficult than estimated. It was about making certain things faster, mostly regarding to generating translation statistics. The cause for the slowness was fuzzy messages, which are messages which have translation but the translation needs updating or reviewing. Information about fuzziness was stored as text string in the message content itself.  Now it is mirrored to another table, where it can be queried without loading the translation and checking the existence of the fuzzy string. Thanks to everyone who helped with that.

Fortunately I managed to do some other tasks too. Siebrand is likely to be happy, that he can export translation of MediaWiki’s namespaces, magic words and special page aliases with one command on command line. That is, instead of using web browser and requesting an export of one of those features for each language individually and pasting them to the translation files. Should save some precious time for better use.

Stay tuned for the next status report! It may take a week or two, as I am planning a little holiday trip to Sottunga in Åland and I don’t expect to be connected very often.

Drawing i18ned text in images.

A picture is worth a thousand words, but drawing a word can be harder than one expects.

Usually it is a good idea to avoid text in images for multiple reasons. Foremost, images make localisation hard. It requires tools, some skill in image manipulation and handwork. Another benefit is the need to store only one copy of the image.

In some cases it is unavoidable to use text in images. In other cases… it is just used for lesser reasons. In this post I will not talk about layout issues, like limited space and inflexibility in image size. In Betawiki we have hundreds of languages, of which many of them are using poorly supported scripts.

PHP GD library provides two methods to draw text. imagestring can be used only to draw text in latin-2, so we can forget it immediately. The other one is imagettftext, which since PHP 5.2.0 allows to use UTF-8. Great, now we can pass all translations we have to it. The next problem is choosing a suitable font, since imagettftext specifically needs path to one in its parameters. As we know, there is no font to cover all scrips, and too many fonts manually map language codes to them and require everyone using the code to install just those fonts.

The only way to automatically choose a proper font for a language (script) code is fontconfig. I have written a wrapper, which calls command line utilities of fontconfig to fetch the most suitable font. This does not solve the missing font problem, but if there is a suitable font in the system and fontconfig knows about it, it will be used. And yet, there is still problems like wrong rotation for Japanese.

The big question: is there any better way to do this?

Page translation + documenting = translated documentation???

Not yet at least. I was sick for few days and actually worked mostly on page translations this week to get it working. But I also wrote some more documentation, but it is not yet published. The wiki page translation should now work with some caveats, and it doesn’t yet have all the features I wanted. See a very simple example here.

It can now display the languages and how complete and up-to-date they the translations are approximately. Suitable translation is not yet automatically selected for the user, but at least the user can now see which languages are available and view them, as opposed to the previous version.

This projects ends in a week. It has been very nice, and I still hope I can recover a little from the problems encountered in this task. Let’s hope the summer doesn’t end this week also, even though I already have done my schedule for next study year in the university.

Status update

This update is somewhat delayed, a bit too much even in my opinion. There has been some problems with the wikipage translation design I started with, like broken and complicated caching. I’m now trying a different approach but I’ve already spend more time on this than the two weeks I have allocated for it. Tomorrow I have an exam, but I’ve planned to spend the rest of the week to try to get something usable out.

After that I change to the other items, two way changes and documentation. If I can finish them quickly I could resume working on the wikipage translation if needed. In any case it looks like that I don’t have time to work on the optional features.