Category Archives: KDE

Goes to planet kde

Wikimania videos: the next billion users on Wikipedia and beyond

Wikimedia DC has started publishing the Wikimania videos on YouTube. They are not split by presentation, only by track, but here are some about localisation and internationalisation.

My Wikimania presentation (see my previous post), Translating the wiki way (starts at 28:05; watch on YouTube):

Amir’s Supporting languages, all of them and Siebrand’s A Tale of Language Support and Ask the Language Support People (watch on YouTube):

Santhosh’s Read and Write in your language has not been published yet and nobody seems to know if it will, or if it has been recorded at all.

Alolita’s The next billion users on Wikipedia with Open Source Webfonts and Amir’s The software localization paradox (watch on YouTube):

See also the category on Wikimania wiki for abstracts and slides for these presentations.

My presentations at Akademy and Wikimania

In July I gave two presentations: one at Akademy 2012 in Tallinn, and one at Wikimania 2012.

Short summary of my Akademy presentation (slides): If you are translating content in MediaWiki and you are not using Translate extension, you are doing it wrong. Statistics, translation and proofreading interface – you get them all with Translate. Because Translate keeps track of changes to pages, you can spend your time translating instead of trying to figure what needs translating or updating.

Also, have a look at UserBase, it has now been updated to include the latest features and fixes of Translate extension, like the ability to group translatable pages into larger groups.

Akademy presenation by Niklas and Claus: click for video. Yes, there’s a a typo.

Short summary of my Wikimania presentation (slides; video not yet available): Stop wasting translators’ time.
Forget signing up to e-mail lists, forget sending files back and forth. Use translation platforms that move files from and to the version control system transparently to the translator.
If you have sentences split into multiple messages, you are doing it wrong. If your i18n framework doesn’t have support for plural, gender and grammar dependent translations, you are doing it wrong. If you are not documenting your interface messages for translators, you are doing it wrong.

Niklas maybe having fun at Library of Congress. Photo tychay, CC-BY-NC-ND

Translation sprint for KDE in Finnish

In our sprint website we’re translating the upcoming KDE SC 4.9 release into Finnish. If you know Finnish, you only have to register to start translating: please join us!
We have a simple goal: translate 10,000 new messages and have all the changes proofread and accepted. In two weeks we have translated more than 3,000 messages and the majority of them have been proofread and accepted. We still have about three weeks to go, so your help is needed to increase the output to reach the goal of 10,000 new translations. As a secondary activity we are also proofreading the existing translations and discussing and harmonizing the terminology. For example should filter be suodin or suodatin.

Keep reading if you are interested in how we organized the sprint from a technical perspective.

This is the second translation sprint I’m organizing with the Translate extension. The first one was in March, when we translated Gnome 3.4 into Finnish and this time we are translating KDE 4.9 into Finnish. I can say that the Translate extension fits for this purpose pretty well:

  • You can set up everything in few hours.
  • There are minimal barriers to start using it (we do require registration).
  • It is suitable for novice translators, because they get feedback when other people proofread and correct their translations.

It is not without its issues either, but I see this as a great opportunity to make the MediaWiki Translate extension even better and have it support a variety of use cases. Let me describe some.

Bugs. There are always some bugs. This time I found a regression in the workflow states feature where the recent changes weren’t backwards compatible with the old configuration format. That was quickly fixed and I also submitted fixes for a few minor issues, which were not encountered before. All in all I have 7 local patches, mostly small behaviour changes like the formatting of message keys or showing the message context field to translators. Most of those can be cleaned up and submitted for merging.

Scalability. I had an impression for a long time that the Translate extension scales up pretty well. After all we have thousands of message groups and 50k messages translated into hundreds of languages at translatewiki.net. How naive I was. All of KDE as we use it (stable and trunk branches merged; including playground and extragear, calligra and other related stuff) contains 200k messages. Turns out that our import tools choke when you try to feed them 350k new messages at once (this includes Finnish translations). As a workaround I had to limit the amount of messages that are processed at once and iterate over the whole process multiple times. This is where the bulk of my time was spent. Of course I also ran out of disk space in the middle of the import. It takes about 1G of space, but currently I have only a tiny 10G disk on the server.

Search. The most requested feature is better search. Currently it is not possible to limit the search to a message group nor to see the translation when searching source texts, or the source text when searching for translations. Also it takes a few clicks before you can edit the message from the search results. Building a good search backend is currently on the backlog of the Wikimedia Localisation team, but it is not yet scheduled for any sprint.

Stay tuned for the results of the KDE Finnish translation sprint.

Updates on translation review feature of Translate extension

About three months ago I blogged about the translation review feature that we developed for the Translate extension. It is time to have a look at how it has been received. Thanks to Siebrand Mazeland we can now draw a graphs for review and reviewer activity. This feature came just in time for the Gnome 3.4 Finnish Translation Sprint that I’m organizing. If you look at its main page, you can see graphs for translation and review activity. The activity isn’t exactly over the top, so if you speak or can translate into Finnish, please join and help us.

I’m aware of three places using this feature: translatewiki.net, Wikimedia Foundation and the translation sprint mentioned above. In translatewiki.net the review ability is not as open as I originally envisioned it to be: only experienced translators can get it by request.  Only about 2% of over 3500 registered translators currently have the review right in translatewiki.net. For the other two places, everyone who can translate can also review.

When looking at the graphs for translatewiki.net we can without doubt see that translation reviewing activity is not yet anywhere near close to the translation activity, and we should consider that there is a huge backlog or previous translations that should also be reviewed. We don’t even see a steady growth in the review activity (around the change of the year we had a translation sprint which temporarily increased translation and review activity to higher than normal levels). We don’t have graphs for Wikimedia projects yet, but looking at the logs the review features seems to be relatively in more active use there. I would personally like to see all new translations from now on to be reviewed at least by one other user.

The next step would be to add a review level column to Special:LanguageStats and Special:MessageGroupStats pages. That would need some idea on how to convey both quantity and coverage. For example, a hundred translators reviewing the same message doesn’t mean that the review coverage is good. Perhaps we should just start with coverage and bring quantity later. This could be a nice small project for someone who wants to help to develop the Translate extension with help from us.

New UIs in MediaWiki Translate extension

I’m not a designer. Yet, I am a designer. During the many years of development of the Translate extension, I have done about all things related to the development of a software project: coding, translating, documenting, testing, system administration, marketing and user interface (UI) design among those. My UI design skills are limited to personal interest and one university course. But I try to pay attention to the UIs I create, and I listen for feedback. For once we got some good feedback about the issues in the current UIs and some suggestions about how to improve it. Based on this feedback I have done two significant changes to Special:Translate – the main translation interface of the Translate extension. The first significant change is to split the page into a few different tasks: translating, proofreading, statistics and export. I implemented these as tabs. Typically the user starts from language statistics and selects the project he wants to translate or proofread. This has the following benefits:

  • The tasks are clearly separated: users can see at a glance what are the things that can be done with the intreface.
  • Switching between tasks is seamless: previously there was no easy way to go back to language statistics from translating or proofreading.
  • There are less visible options at a time: the UI just looks nicer and takes less space.

The second change is an embedded translation editor. This feature is still in beta phase, and if we get enough positive feedback about it, we will switch over from the old popup based editor. You can test the editor by going to Special:Translate and double clicking the text you want to translate. This should prevent the hassle of moving and resizing dialogs. On the other hand it has some problems with the editor moving on the screen when you advance to next message, and it also stands out worse in the middle of the surrounding context. I’m investigating if and how we can mitigate these issues. I’ve already changed some stylings to make the editor stand out more and the whole table appear less heavy. As a bonus the embedded editor feels faster, because I’ve added some preloading. This means that when you save your translation and go to the next message, it will show up instantly because it has already been loaded.

New translation memories near you soon

In the last sprint I developed a translation memory server in PHP almost from scratch. Well, it’s not really a server. It’s run inside MediaWiki during client requests. It closely follows the logic of tmserver from translatetoolkit, which uses Python and SQLite.

The logic of how it works is pretty simple: you store all definitions and translations in a database. Then you can query suggestions for a certain text. We use string length and fulltext search to filter the initial list of candidate messages down. After that we use a text similarity algorithm to rank the suggestions and do the final filtering. The logic is explained in more detail in the Translate extension help.

PHP provides a text matching function, but we (Santhosh) had to implement pure PHP fallback for strings longer than 255 bytes or strings containing anything else than ASCII. The pure PHP version is much slower, although that is offset a little because it’s more efficient when there are fewer characters in a string than bytes. But more importantly, it works correctly even when not handling English text. The faster implementation is used when possible. Before we did some optimizations to the matching process, it was the slowest part. After those optimizations the time is now bound by database access. The functions implement the Levenshtein edit distance algorithm.

End users won’t see much difference. Wanting a translation memory on Wikimedia wikis was the original reason for reimplementing translation memory in PHP, and in the coming sprints we are going to enable it on wikis where Translate is enabled (meta-wiki, mediawiki.org, incubator and wikimania2012 currently). It is just over 300 lines of code [1] including comments and in addition there are database table definitions [2].

Now, having explained what was done and why, I can reveal the cool stuff, if you are still reading. There will also be a MediaWiki API module that allows querying the translation memory. There is a simple switch in the configuration to choose whether the memory is public or private. In the future this will allow querying translation memories from other sites, too.

Putting that another pair of eyes into good use

This blog post is about the MediaWiki Translate extension and explains how we got to develop a new set of translation review tools.

One of the core principles at translatewiki.net is that the time of translators is a prestige resource. We show our appreciation to translators by providing tools that let them concentrate 100% on the task at hand and let the (volunteer) staff handle the boring tasks.

It is well known that good translators take pride of their and others work. This may result in a urge to review all translations made by other translators. I consider myself being that kind of translator. The good news is that in recent months the Translate extension has got massively better at supporting reviewing of translations. Some weeks ago we added a new listing where you can click a button to accept a translation. When the list is empty, you know that all translations have either been made or fixed by you, or you have accepted someone elses’ translations.

This is all nice and dandy, but if you want to review new translations as they come in it is not practical. You’d either have to watch the list of recent translations or subscribe to the feed of them. From here you can get to the individual messages, but it takes many clicks to get to the page where you see the button to accept the translation. And iterating over each of the hundreds of message groups to see if there is anything to accept is not practical either.

The solution: a special message group which lists the recent translations in a given language. Since only some of the translators are allowed to review, on the right you can see a screenshot of how it looks like – click to enlarge. One could bookmark this page and have a look at it a few times per week. For me this is a real time saver, and I’m sure others will find it useful too.

To get this implemented, I originally anticipated that some heavy refactoring was needed and I estimated about one and a half day for it. In the end it took only about half a day – I was positively surprised how painless the refactoring was. The problem was that the class which fetches all the messages from the database assumed they all belong in the same MediaWiki namespace. In translatewiki.net we have over ten namespaces for translations of different projects, so it had to be fixed. I’d say this is a prime example of the saying Premature optimization is the root of all evil by Donald Knuth.

In the future we need to link this page from suitable places to make this feature discoverable and also to make sure that more than the current 66 users out of 3000+ translators get the right to use this feature.

WebWorld 2011 – wrap-up

Unfortunately my time machine is broken, so instead of telling what cool features are coming you have to bear with summary of what I did during the WebWorld sprint.

As you know UserBase Wiki uses the Translate extension to translate the wiki content. I can now cross off a common feature request from my todo list: moving and deleting translated pages. Since each language has its own page and the system uses even more pages behind the scenes, the normal move and delete actions of MediaWiki were insufficient. With some hackish code I was able to hijack those actions and replace them with my own. It is now possible to move or delete a page with all of its translations with few clicks. You can also choose to delete only one translation, which is useful if the translator accidentally used a wrong language.

For those who are addicted to stats, Special:LanguageStats now has a row which states the overall translation coverage. The number can be off a small amount for a reason unknown to me. I have to investigate why and fixit, since statistics never lie :)

And there is one more nicety regarding Special:MyLanguage, which takes care of redirecting users to their preferred language translation of a page, assuming such a translation exists. If the given page does not exist at all, the link using Special:MyLanguage is now red just like normal links to non-existing pages are.

The sprint itself was  productive. There were problems that needed to be solved, and I think we all did a good job tackling the many issues. We also managed to create some new problems: Ingo needs to learn how to not have toys stuck in inconvenient high places :) And I like the logo very much :)

Webworld 2011 – MediaWiki and UserBase

Greetings to the Planet KDE readers, where this should be my first blog post. My nickname on the net is Nikerabbit. I’ve been developing MediaWiki for many years, and I’m the author of the Translate extension. The Translate extension is used at translatewiki.net which is a wiki site and community that does open source software translation. Translate extension can also be used to translate wikipages, which is the way it is used on userbase.kde.org.

That is also the reason why I am now here at the KDE WebWorld sprint. I have updated the Translate extension on UserBase and fixed a number of bugs in it. Mostly minor things that can confuse normal users – they are no longer automatically directed to pages that are of no use to them. And UserBase now has its own translation memory. It’s the same kind we use in translatewiki.net: a very simple one provided by the translate toolkit. Currently those two are independent of each other, but maybe in the future we can find a way to use each others translations.

While working on UserBase issues here, I realized some problems in MediaWiki. First of all the Translate extension is supposed to be compatible with two MediaWiki versions: the latest stable version (1.16, used on UseBase) and the latest development version (1.19alpha, used on translatewiki.net). I am not going to talk about why there are two unreleased versions and third one being in development. Anyway, a lot of development has happened between those versions, including major new features and big rewrites. I’m spending a considerable time keeping Translate extension compatible with 1.16 when developing new features for it. It also makes the code more complex and doubles the testing required. It is not made easier by the fact that not all changes are documented in appropriate places. For example there is no mentions in hooks.txt that the parameters for SkinSubPageSubtitle hook have been changed at some point.

Another thing is that MediaWiki has so many ways to tweak it, most of them undocumented and not easily discoverable. For example all the messages in MediaWiki: namespace, some of which may even be empty by default. There is no way you can find a suitable message unless you already know that such message exists and how it is called. The same applies to configuration variables. They are at least documented in DefaultSettings.php and some also in mediawiki.org, but again it is hard to find some specific thing that could help you (if it even occurs to you that such thing might exist).

This means that people can’t really find out everything you can do with MediaWiki and they either end up not doing some things at all or creating something new from scratch.