Category Archives: translatewiki.net

New UIs in MediaWiki Translate extension

I’m not a designer. Yet, I am a designer. During the many years of development of the Translate extension, I have done about all things related to the development of a software project: coding, translating, documenting, testing, system administration, marketing and user interface (UI) design among those. My UI design skills are limited to personal interest and one university course. But I try to pay attention to the UIs I create, and I listen for feedback. For once we got some good feedback about the issues in the current UIs and some suggestions about how to improve it. Based on this feedback I have done two significant changes to Special:Translate – the main translation interface of the Translate extension. The first significant change is to split the page into a few different tasks: translating, proofreading, statistics and export. I implemented these as tabs. Typically the user starts from language statistics and selects the project he wants to translate or proofread. This has the following benefits:

  • The tasks are clearly separated: users can see at a glance what are the things that can be done with the intreface.
  • Switching between tasks is seamless: previously there was no easy way to go back to language statistics from translating or proofreading.
  • There are less visible options at a time: the UI just looks nicer and takes less space.

The second change is an embedded translation editor. This feature is still in beta phase, and if we get enough positive feedback about it, we will switch over from the old popup based editor. You can test the editor by going to Special:Translate and double clicking the text you want to translate. This should prevent the hassle of moving and resizing dialogs. On the other hand it has some problems with the editor moving on the screen when you advance to next message, and it also stands out worse in the middle of the surrounding context. I’m investigating if and how we can mitigate these issues. I’ve already changed some stylings to make the editor stand out more and the whole table appear less heavy. As a bonus the embedded editor feels faster, because I’ve added some preloading. This means that when you save your translation and go to the next message, it will show up instantly because it has already been loaded.

New translation memories near you soon

In the last sprint I developed a translation memory server in PHP almost from scratch. Well, it’s not really a server. It’s run inside MediaWiki during client requests. It closely follows the logic of tmserver from translatetoolkit, which uses Python and SQLite.

The logic of how it works is pretty simple: you store all definitions and translations in a database. Then you can query suggestions for a certain text. We use string length and fulltext search to filter the initial list of candidate messages down. After that we use a text similarity algorithm to rank the suggestions and do the final filtering. The logic is explained in more detail in the Translate extension help.

PHP provides a text matching function, but we (Santhosh) had to implement pure PHP fallback for strings longer than 255 bytes or strings containing anything else than ASCII. The pure PHP version is much slower, although that is offset a little because it’s more efficient when there are fewer characters in a string than bytes. But more importantly, it works correctly even when not handling English text. The faster implementation is used when possible. Before we did some optimizations to the matching process, it was the slowest part. After those optimizations the time is now bound by database access. The functions implement the Levenshtein edit distance algorithm.

End users won’t see much difference. Wanting a translation memory on Wikimedia wikis was the original reason for reimplementing translation memory in PHP, and in the coming sprints we are going to enable it on wikis where Translate is enabled (meta-wiki, mediawiki.org, incubator and wikimania2012 currently). It is just over 300 lines of code [1] including comments and in addition there are database table definitions [2].

Now, having explained what was done and why, I can reveal the cool stuff, if you are still reading. There will also be a MediaWiki API module that allows querying the translation memory. There is a simple switch in the configuration to choose whether the memory is public or private. In the future this will allow querying translation memories from other sites, too.

Putting that another pair of eyes into good use

This blog post is about the MediaWiki Translate extension and explains how we got to develop a new set of translation review tools.

One of the core principles at translatewiki.net is that the time of translators is a prestige resource. We show our appreciation to translators by providing tools that let them concentrate 100% on the task at hand and let the (volunteer) staff handle the boring tasks.

It is well known that good translators take pride of their and others work. This may result in a urge to review all translations made by other translators. I consider myself being that kind of translator. The good news is that in recent months the Translate extension has got massively better at supporting reviewing of translations. Some weeks ago we added a new listing where you can click a button to accept a translation. When the list is empty, you know that all translations have either been made or fixed by you, or you have accepted someone elses’ translations.

This is all nice and dandy, but if you want to review new translations as they come in it is not practical. You’d either have to watch the list of recent translations or subscribe to the feed of them. From here you can get to the individual messages, but it takes many clicks to get to the page where you see the button to accept the translation. And iterating over each of the hundreds of message groups to see if there is anything to accept is not practical either.

The solution: a special message group which lists the recent translations in a given language. Since only some of the translators are allowed to review, on the right you can see a screenshot of how it looks like – click to enlarge. One could bookmark this page and have a look at it a few times per week. For me this is a real time saver, and I’m sure others will find it useful too.

To get this implemented, I originally anticipated that some heavy refactoring was needed and I estimated about one and a half day for it. In the end it took only about half a day – I was positively surprised how painless the refactoring was. The problem was that the class which fetches all the messages from the database assumed they all belong in the same MediaWiki namespace. In translatewiki.net we have over ten namespaces for translations of different projects, so it had to be fixed. I’d say this is a prime example of the saying Premature optimization is the root of all evil by Donald Knuth.

In the future we need to link this page from suitable places to make this feature discoverable and also to make sure that more than the current 66 users out of 3000+ translators get the right to use this feature.

MediaWiki grows up – no more playing with Lego

User interface messages built from pieces of text or leaving some parts out of a message are what is called Lego messages. The end result of this practice is not a glittering Lego castle. The end result is more like a shady shack with a leaking roof.

Major Lego message usage in MediaWiki will soon be in the past as I have refactored the MediaWiki logging system and brought the code to match with what we expect from internationalisation today. Instead of snippets “moved X to Y” translators can now work with full sentences like “U moved X to Y”. It makes it possible to change the message to “Page X was moved to Y by U”. Consider the languages where sentences don’t begin with the subject. It must have been as awkward as “moved U X to Y” would be in English.

There is more: translations can now take the gender of the user who performed the action into account. English almost always gets away from taking sides in interface messages, but that is not the case in many other languages.

We already have many translations using these new possibilities:

  • English: Nike moved page Hapsen to Saalen
  • Welsh: Symudwyd y dudalen Hapsen i Saalen gan Nike
  • Russian (male): Nike переименовал страницу Hapsen в Saalen
  • Russian (female): Никa переименовала страницу Hapsen в Saalen

translatewiki.net celebrates – so do I

Oh boy time flies. Translatewiki.net turns six years next Saturday. This is the first time we celebrate its birthday. How did it happen?

It was 2005, my last year at upper secondary school when I set up a MediaWiki for myself to do some school work. I was 17, and in the fall of the same year I started studying at a university. Can you imagine how awkward it was to attend university under age of majority (18 years in Finland)? Anyway, I think the wiki was originally called Nukawiki, then Betawiki and finally translatewiki.net. The wiki has gone through many updates. It probably started with Mediawiki 1.4 which boasts in release notes that User interface language can be changed by the user. It’s also gone through many computers starting from my laptop and gradually to more powerful, more dedicated servers.

Already before the summer of 2006, when I started my obligatory military service which lasted six months, I was using the wiki to translate MediaWiki into Finnish and fix i18n problems. In 2006 we started inviting other translators to join. In February 2007 I started translating FreeCol into Finnish and soon they moved all translation related activities into our wiki. One of the initial translators was Siebrand, who has had enormous influence on the direction the project has taken since he joined.

In other words translatewiki.net was a small hobby project for an entirely different purpose, then I used it to scratch a personal itch, and nowadays it is a thriving community with thousands of members. We are already huge in many metrics, we are still growing and there doesn’t seem to be any boundaries for our size. I just cannot imagine how many people the work of translatewiki.net has impacted. For me this means an opportunity, but more importantly a challenge. How do we improve our service while scaling up? How can we provide better tools for translators, for ourselves and for projects that use us? We have been successful thus far, because we have been very efficient – it is almost scary how few people (albeit very dedicated) can keep everything running smoothly.

Translatewiki.net has had and still does have huge impact to my life. It is just not because it is a huge time sink for me. It is a manifestation of the many skills I’ve learned during my life. It feels wrong to say that it is my hobby, because sometimes it feels that studying is the hobby here. Nevertheless my master thesis is nearing completion. I already have a job in mind and I can’t say that translatewiki.net didn’t affect that.

I’m sincerely grateful to each and everyone who has helped translatewiki.net to become what it is today.

Translatewiki.net is happy

Many of the issues that have been annoying us all in translatewiki.net have been fixed lately. To show my appreciation on behalf of translatewiki.net I’d like to highlight these fixes.

Issue one: saving messages in talk pages fails. If you just pressed “Save” once, you got an error message about broken session data. The reply was saved only if you clicked “Save” a second time. I don’t know how many messages we lost due to this. I lost a couple because after replying to a thread, I went to do some other things. Many of my replies were delayed, because I didn’t notice immediately that the save failed. What was worse, usually one would have to scroll down the page to even see the error message! I’m very happy that it is fixed now. Many thanks to Andrew Garrett!

Issue two: portions of changes were not shown at all when viewing differences between two versions. Not as annoying as the first item, this was still nasty and confusing us. I submitted a test case for this bug in wikidiff2 extension and fortunately Tim Starling was able to reproduce it. Soon after he committed a fix. Thanks Tim!

Issue three: message groups for projects which store all translations into single file (like Pywikipediabot) were stuck in “has changes” status. This bug only annoyed the project leaders of translatewiki.net. After some encouragement Robert Leverington came up with a fix and found a serious bug in code which determines if there has been any changes into the messages. The fix affects all message groups. To Robert: good catch and big thanks.

Issue four: Microsoft® Translator, one of the translation services we use to suggest translations for our translators next to Google Translate, Apertium and our own tmserver, is often incorrectly identified to be down. Brian Wolff and Sam Reed have helped to investigate the issue, but it is not yet fully fixed.

Finally many thanks to those who help us to keep translatewiki.net running from day to day, you are many. A special thanks goes out to netcup.de – Webhosting, vServer, Servermanagement who has provided us with their flagship product “vCloud 8000”, which allows us to serve our pages faster than ever before. We need lots of help with challenges that range from coding to writing and design. Don’t hesitate to ask us how you could help us!

Chatty bots and minimizing disruptions in continuous integration

Those who use IRC are probably familiar with bots. Esstentially bot is a client which is a not human. This time I’m talking about specific kind of bots, let’s call them reporting bots. Their purpose is to alert the channel about recent happenings in (near) real time. Open source project channel usually at least have a bot that reports every new commit and bug report filed.

Also the translatewiki.net channel #mediawiki-i18n has reporting bots. We have one CIA bot reporting any i18n related commit to any of our supported projects. I have to mention that the ability to have own ruleset for picking and formatting commits is just awesome. There is also another bot, rakkaus (“love” in Finnish).

Its purpose is to report issues with the site. To accomplish this we pipe the output of error_log, which contains PHP warnings, database errors and MediaWiki exceptions, to the bot. It worked mostly fine, except that bot would flood everyone when the log was growing fast. Few days ago it went too far. We had a database error (a deadlock), which was reported by the bot… including the database query… which happened to contain few hundred kilobytes of serialized and compressed data–in other words binary garbage. Guess how happy we are were when we save channel full of that??

Okay, something had to be done. And so I did. I wrote a short PHP script which:

  • Reads new data every 10 second
  • Takes the last line, truncates it to suitable length and forwards only the snippet and notifies how many lines were skipped in the log

And now everything is nice again :) The script is not yet in SVN, but I will commit it later.

By the way, this bot is half of the reason why we might complain to you in few minutes after you committed code which breaks something in MediaWiki. Fortunately MediaWiki has taken steps to prevent committing code which doesn’t even compile, so we can skip some of the useless mistakes caused by carelessness.

Because we care about the users using translatewiki.net, we want to minimize any disruptions. The measures we have taken are:

  • Even though we update code often, we can rollback easily. With small updates it is easy to identify the cause and chances are it is fixed very fast too.
  • I personally am doing code review, trying to spot most issues before they reach us.

The Translate extension for MediaWiki has documentation

The Translate extension for MediaWiki is no longer just a hack for translatewiki.net. Actually it hasn’t been that for a long time anymore, but recently other projects have started using it. That means lots of things, like supporting stable releases of MediaWiki, instead of just development versions.

Today’s topic is documentation. I have been amending our existing documentation with Siebrand. Previously there was only some documentation how to install the Translate extension. Now we have sections for the page translation feature, the configuration of the extension, message group configuration and command line scripts. All these have been collected into our documentation index page along with links to other resources. One of those other resources is code documentation generated with Doyxgen. That should really help anyone who is interested in developing for the Translate extension – yes, we are looking for help!

Naturally documentation is a moving target and it will be improved continuously, like the code itself. While we have documentation for developers and those who want to install and configure the Translate extension, we are still lacking great user documentation in many areas. Even though the saying goes that good software does not need separate documentation, that does not mean we shouldn’t have any. It is important to show everyone what can be done with the Translate extension and to get them either interested or have them use the software (more) efficiently as an end user.

GSoC wrap-up – Translate extension

GSoC is almost over now. Lots of cool things have happened, but unfortunately you may not be aware of it, because I have neglected to blog about it.  That is definitely a regression compared to last year and something to keep in mind in the future. I managed to do almost all tasks from the project plan with priority higher than 4, with some rough edges there and here. Next I will pick some highlights from the completed tasks.

Improved usability

This year there were many usability related issues to improve the translation work flow. The improvements done by Wikimedia Usability project nicely complements my work for the benefit our less technically oriented audience. Most important improvement is probably the buzzword compatible ajax-editing. No more do translators need to open new browser tab for each message they want to edit, but instead they get floating dialog inside the current page (implemented using jQuery dialog). This means they never need to leave the list of messages any more, but it stays always in the background. It also makes easier to do quick edits to message documentation or other languages, because you just get a new dialog, and once you finished editing it, you are back in the previous message.

Ajax edit interface

Ajax edit interface

Other features to include user preference to choose additional languages to show when translating. The feature itself is not new, but now users can customise the list of languages.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

To date we not really taken advantage of achievements in language technology. Now we have taken the first steps towards it by implementing a simple translation memory. It is a very simple setup, where we use tmserver from Translate Toolkit and fill it from time to time with existing translations from translatewiki.net. Tmserver uses well known Levenshtein algorithm to give suggestions. It isn’t very good, nor anything compared to state of the art systems, but the suggestions have already been useful, as told us by the translators itself. There is many ways to improve the suggestions from better algorithms to using larger set of translations as source data and preprocessing the source data (text alignment, case and punctuation normalisation). I’m looking forward to them.

Other changes

There were many improvements to the lesser used features. Special features (magic names, special page alias, namespaces) can now be exported using a script. No more time wasted in copy-pasting. In addition it is now possible to localise magic words for extensions. It is up to translation teams to decide, whether they want to do this understandably controversial thing.

In message checks there were at times false positives which caused confusion among translators. Now there is flexible system to suppress those warnings.

Gettext-style plurals are now supported better, but no one of our Gettext projects is currently using those yet. Related, there is now a special page to import offline translations. We can now give trusted translators or users the permission to import offline translations, delegating them away from server admins. It supports download-from-url, files uploaded to the wiki and local file uploads.

The offline importer actually uses the same engine that I developed for another feature: web based message group management. It is now possible for project admins to import external changes, fuzzy other changes if necessary using their browser. It is much easier than doing those steps manually on the command line, but there is still some practical problems to solve. One major piece still missing is integration with version control systems, so command line access is still needed to do svn up or similar for other systems. It is somewhat related to the other problem, which is limited execution time for web requests. It is currently wise enough to check after every action if we are near the limit, and stop further processing and give the user ability to continue from that point. We can’t increase the execution time limitlessly, but there might be hope for example by doing multiple requests with ajax to spare the user from clicking continue button many times.

The future

There is always something to be done or something that can be improved. I will target on improving the new web interface and group management, which is still quite immature. Ajax-editing works, but is still missing the cool-factor without proper polishing. And like that isn’t enough, Siebrand has collected wish list for me. I will try my best to fulfil each request with my time which is limited especially now that study year starts again.

It will be interesting to see where we are next year. We are not alone any more and while other platforms are developing I want to keep translatewiki.net special – to give a face to internationalisation and localisation instead of being just a dumping ground for translations.

GSoC status report – Translate extension

Last year I participated in Summer code Finland. During that I added many new features to Translate extension, to allow biggest user of the extension, translatewiki.net grow bigger. And now translatewiki.net is indeed bigger. This year the project plan contains many tasks, which aim to make the using experience more pleasant both the translators and the project admins. In addition there is a pile of bug fixes and i18n improvements to MediaWiki. I will tell more about those features when I finish them.

The first coding week is now in the past. The big task for that week turned out to be more difficult than estimated. It was about making certain things faster, mostly regarding to generating translation statistics. The cause for the slowness was fuzzy messages, which are messages which have translation but the translation needs updating or reviewing. Information about fuzziness was stored as text string in the message content itself.  Now it is mirrored to another table, where it can be queried without loading the translation and checking the existence of the fuzzy string. Thanks to everyone who helped with that.

Fortunately I managed to do some other tasks too. Siebrand is likely to be happy, that he can export translation of MediaWiki’s namespaces, magic words and special page aliases with one command on command line. That is, instead of using web browser and requesting an export of one of those features for each language individually and pasting them to the translation files. Should save some precious time for better use.

Stay tuned for the next status report! It may take a week or two, as I am planning a little holiday trip to Sottunga in Åland and I don’t expect to be connected very often.