GNU i18n for high priority projects list

Today, for a special occasion, I’m hosting this guest post by Federico Leva, dealing with some frequent topics of my blog.

A special GNU committee has invited everyone to comment on the selection of high priority free software projects (thanks M.L. for spreading the word).

In my limited understanding from looking every now and then in the past few years, the list so far has focused on “flagship” projects which are perceived to the biggest opportunities, or roadblocks to remove, for the goal of having people only use free/libre/open source software.

A “positive” item is one which makes people want to embrace GNU/Linux and free software in order to user it: «I want to use Octave because it’s more efficient». A “negative” item is an obstacle to free software adoption, which we want removed: «I can’t use GNU/Linux because I need AutoCAD for work».

We want to propose something different: a cross-fuctional project, which will benefit no specific piece of software, but rather all of them. We believe that the key for success of each and all the free software projects is going to be internationalization and localization. No i18n can help if the product is bad: here we assume that the idea of the product is sound and that we are able to scale its development, but we “just” need more users, more work.

What we believe

If free software is about giving control to the user, we believe it must also be about giving control of the language to its speakers. Proper localisation of a software can only be done by people with a particular interest and competence in it, ideally language natives who use the software.

It’s clear that there is little overlap between this group and developers; if nothing else, because most free software projects have at most a handful developers: all together, they can only know a fraction of the world’s languages. Translation is not, and can’t be, a subset of programming. A GNOME dataset showed a strong specialisation of documenters, coders and i18n contributors.

We believe that the only way to put them in control is to translate the wiki way: easily, the only requirement being language competency; with no or very low barriers on access; using translations immediately in the software; correcting after the fact thanks to their usage, not with pre-publishing gatekeeping.

Translation should not be a labyrinth

In most projects, the i18n process is hard to join and incomprehensible, if explained at all. GNOME has a nice description of their workflow, which however is a perfect example of what the wiki way is not.

A logical consequence of the wiki way is that not all translators will know the software like their pockets. Hence, to translate correctly, translators need message documentation straight in their translation interface (context, possible values of parameters, grammatical role of words, …): we consider this a non-negotiable feature of any system chosen. Various research agrees.

Ok, but why care?

I18n is a recipe for success

First. Developers and experienced users are often affected by the software localisation paradox, which means they only use software in English and will never care about l10n even if they are in the best position to help it. At this point, they are doomed; but the computer users of the future, e.g. students, are not. New users may start using free software simply because of not knowing English and/or because it’s gratis and used by their school; then they will keep using it.

With words we don’t like much, we could say: if we conquer some currently marginal markets, e.g. people under a certain age or several countries, we can then have a sufficient critical mass to expand to the main market of a product.

Research is very lacking on this aspect: there was quite some research on predicting viability of FLOSS projects, but almost nothing on their i18n/l10n and even less on predicting their success compared to proprietary competitors, let alone on the two combined. However, an analysis of SourceForge data from 2009 showed that there is a strong correlation between high SourceForge rank and having translators (table 5): for successful software, translation is the “most important” work after coding and project management, together with documentation and testing.

Second. Even though translation must not be like programming, translation is a way to introduce more people in the contributor base of each piece of software. Eventually, if they become more involved, translators will get in touch with the developers and/or the code, and potentially contribute there as well. In addition to this practical advantage, there’s also a political one: having one or two orders of magnitude more contributors of free software, worldwide, gives our ideas and actions a much stronger base.

Practically speaking, every package should be i18n-ready from the beginning (the investment pays back immediately) and its “Tools”/”Help” menu, or similarly visible interface element, should include a link to a website where everyone can join its translation. If the user’s locale is not available, the software should actively encourage joining translation.

Arjona Reina et al. 2013, based on the observation of 41 free software projects and 22 translation tools, actually claim that recruiting, informing and rewarding the translators is the most important factor for success of l10n, or even the only really important one.

Exton, Wasala et al. also suggest to receive in situ translations in a “crowdsourcing” or “micro-crowdsourcing” limbo, which we find superseded by a wiki. In fact, they end up requiring a “reviewing mechanism such as observed in the Wikipedia community” anyway, in addition to a voting system. Better keep it simple and use a wiki in the first place.

Third. Extensive language support can be a clear demonstration of the power of free software. Unicode CLDR is an effort we share with companies like Microsoft or Apple, yet no proprietary software in the world can support 350 languages like MediaWiki. We should be able to say this of free software in general, and have the motives to use free software include i18n/l10n.

Research agrees that free software is more favourable for multilingualism because compared to proprietary software translation is more efficient, autonomous and web-based (Flórez & Alcina, 2011; citing Mas 2003, Bowker et al. 2008).

The obstacle here is linguistic colonialism, namely the self-disrespect billions of humans have for their own language. Language rights are often neglected and «some languages dominate» the web (UNO report A/HRC/22/49, §84); but many don’t even try to use their own language even where they could. The solution can’t be exclusively technical.

Fourth. Quality. Proprietary software we see in the wild has terrible translations (for example Google, Facebook, Twitter). They usually use very complex i18n systems or they give up on quality and use vote-based statistical approximation of quality; but the results are generally bad. A striking example is Android, which is “open source” but whose translation is closed as in all Google software, with terrible results.

How to reach quality? There can’t be an authoritative source for what’s the best translation of every single software string: the wiki way is the only way to reach the best quality; by gradual approximation, collaboratively. Free software can be more efficient and have a great advantage here.

Indeed, quality of available free software tools for translation is not a weakness compared to proprietary tools, according to the same Flórez & Alcina, 2011: «Although many agencies and clients require translators to use specific proprietary tools, free programmes make it possible to achieve similar results».

We are not there yet

Many have the tendency to think they have “solved” i18n. The internet is full of companies selling i18n/10n services as if they had found the panacea. The reality is, most software is not localised at all, or is localised in very few languages, or has terrible translations. Explaining the reasons is not the purpose of this post; we have discussed or will discuss the details elsewhere. Some perspectives:

Gettext is powerful but problematic (cf. the Gettext Localisation horror story, 1999).
Mozilla i18n and L20n has an unclear direction and a tendency to turn localisation into programming.
We know little of most proprietary software.
In the translatewiki.net intro, Translating the wiki way (video) and Localisation for developers (doc) we try to explain what matters for us and how we do things at translatewiki.net.

A 2000 survey confirms that education about i18n is most needed: «There is a curious “localisation paradox”: while customising content for multiple linguistic and cultural market conditions is a valuable business strategy, localisation is not as widely practised as one would expect. One reason for this is lack of understanding of both the value and the procedures for localisation.»

Can we win this battle?

We believe it’s possible. What above can look too abstract, but it’s intentionally so. Figuring out the solution is not something we can do in this document, because making i18n our general strength is a difficult project: that’s why we argue it needs to be in the high priority projects list.

The initial phase will probably be one of research and understanding. As shown above, we have opinions everywhere, but too little scientific evidence on what really works: this must change. Where evidence is available, it should be known more than it currently is: a lot of education on i18n is needed. Sharing and producing knowledge also implies discussion, which helps the next step.

The second phase could come with a medium term concrete goal: for instance, it could be decided that within a couple years at least a certain percentage of GNU software projects should (also) offer a modern, web-based, computer-assisted translation tool with low barriers on access etc., compatible with the principles above. Requirements will be shaped by the first phase (including the need to accommodate existing workflows, of course).

This would probably require setting up a new translation platform (or giving new life to an existing one), because current “bigs” are either insufficiently maintained (Pootle and Launchpad) or proprietary. Hopefully, this platform would embed multiple perspectives and needs of projects way beyond GNU, and much more un-i18n’d free software would gravitate here as well.

A third (or fourth) phase would be about exploring the uncharted territory with which we share so little, like the formats, methods and CAT tools existing out there for translation of proprietary software and of things other than software. The whole translation world (millions of translators?) deserves free software. For this, a way broader alliance will be needed, probably with university courses and others, like the authors of Free/Open-Source Software for the Translation Classroom: A Catalogue of Available Tools and tuxtrans.

“What are you doing?”

Fair question. This proposal is not all talk. We are doing our best, with the tools we know. One of the challenges, as Wasala et al. say, is having a shared translation memory to make free software translation more efficient: so, we are building one. InTense is our new showcase of free software l10n and uses existing translations to offer an open translation memory to everyone; we believe we can eventually include practically all free software in the world.

For now, we have added a few dozens GNU projects and others, with 55 thousands strings and about 400 thousands translations. See also the translation interface for some examples.

If translatewiki.net is asked to do its part, we are certainly available. MediaWiki has the potential to scale incredibly, after all: see Wikipedia. In a future, a wiki like InTense could be switched from read-only to read/write and become a über-translatewiki.net, translating thousands of projects.

But that’s not necessarily what we’re advocating for: what matter is the result, how much more well-localised software we get. In fact, MediaWiki gave birth to thousands of wikis; and its success is also in its principles being adopted by others, see e.g. the huge StackExchange family (whose Q&A are wikis and use a free license, though more individual-centred).

Maybe the solution will come with hundreds or thousands separate installs of one or a handful software platforms. Maybe the solution will not be to “translate the wiki way”, but a similar and different concept, which still puts the localisation in the hands of users, giving them real freedom.

What do you think? Tell us in the comments.

-- Nemo.

14 thoughts on “GNU i18n for high priority projects list”

Ioannis 2015-02-01 at 03:17

I’ve done an enormous amount of translation. To my knowledge, I’m probably the most prolific FOSS translator for the Greek language and have a strange tendency of popping up just about everywhere. You can probably guess who I am from those words alone. My work has also benefitted from the fact I’m native in both Greek and English.

Regardless, I’ve been out of the translation game for a while, though I still maintain translations for the software I still use (my computing habits have changed over time). Your analysis is quite impressive (apart from some highly disagreeable and unnecessary Marxism), though I feel it lacks one crucial point, which I’ll get to later.

As you pointed out, the best translators are, first and foremost, experienced users of the software. Though most translators, there are a few translators (much like me) who appear to be just about everywhere. Unlike me, they do pathetic levels of quality control and very often don’t use the software they’re translating. I won’t blast them for it (they are still contributing, afterall), but it often leads to low quality translations which never get fixed and paradoxically scare users from the software altogether. This practice has to stop. While translating should always be as open and easy to get into as possible, there should also be an open communication channel with developers. This could not only be helpful for people asking questions, but also to weed out these experienced translators who are horrendously inexperienced with the software. Their testimonies of how they blow through 100 strings in a day without *ever* having used the software shock me to this day.

Aside from that — and very much linked to the second point, developers and translators often lack a grounding in the principles of Human-Computer Interaction (HCI), particularly in its applied form of UX (User eXperience) design. Too many have never picked up a UX book or even read a designer’s blog in their life, and it really shows — particularly in GNU software (which I confess to not be a fan of, aside from Emacs and Midnight Commander). The ‘old guard’ in particular places an emphasis on reading manuals and guidebooks that is simply unacceptable and could be done away with given a good user experience.

Furthermore, UX design strongly influences the quality of the strings being put up for translation. I consider TinyMCE to be my best translation to date. Unsurprisingly, it has a phenomenally refined UX, almost perfectly formed strings that have been extensively tested in practice (because the 30% rule is never enough) and it also has 100% completion for an enormous number of translations. A reasonable grounding in UX design (2 good books should suffice) is a must.

Finally, and linked to the previous two points — the formats we have now are insufficient for the job. Aside from Gettext’s many problems (esp. a lack of unique IDs for each string, as your post mentions), most applications that use the system have to be recompiled to regenerate the strings, which is decisively into no-go territory for your average user — especially if they’re on Windows like I was when I started out. Ideally, you should be able to directly ‘drop in’ translations into an application. Even better, you could well be able to translate from the program directly, and this is less of a pipe dream than one might think. Web-based systems have also started experimenting with this by allowing you to edit strings directly in the webapp. Alright, I lied; there was only one such example, which was proprietary, and I’ve forgotten its name now anyway.

To this date, the system I’ve been most impressed by is OpenTTD’s, that I strongly recommend you take a look at. It allows for an impressive amount of flexibility, including genders and cases. You can find it and play it at http://openttd.org/. I quit it about two years ago as I was dissatisfied with the goals the game sets (which effectively rule out competitive game play) and the developers’ insistence that no-one cares and only sandbox gameplay matters. >.>

As a side note, I like your efforts to integrate a translation memory into Translatewiki.net, but the interface design makes it inconvenient to access, meaning we almost never look at it. The most successful approach to this is — by far and away — Virtaal’s. Not only does it include an enormous variety of sources (which you should too), it presents them on-the-fly in the input box without requiring any further clicks (like on TW.net) or scrolling down (like on Transifex).

Anyway, excuse the rant(s), but it’s very late here and too many things could be improved when it comes to i18n, thus my ranting was inevitable.
So how do we get better translations in FOSS?
The first step is educating developers in UX.
The second is developer and user interaction, because effective translation (or i18n/l10n as boring businesspeople like to call two sides of the same coin) mandates as much.
Thirdly, the toolchain must be improved with translators in mind. The developers’ issues are a sideshow here, and you should be soliciting advice directly from translators if you want to do further research. Issues include the lack of support for cases and genders (and whatever grammatical features a given language requires), interfaces adapting poorly to other languages and the general difficulty of seeing how translations look in practice.
Patrick 2015-02-01 at 11:13

These ideas sound great, but the UI sounds much too complicated. The default UIfor fixing a l18n error should not be a wiki, but maybe a special shortcut with a click on the string where it should be directly editable with the result being send to the translation server (which can be a wiki or not). That way, I would fix much more l18n problems on-the-fly, because the barriers would be really low.
Simon 2015-02-01 at 23:03

I agree with the notion that translation is very important, if not essential to get people to contribute more easily to free software.

I also agree with Ioannis, I think the most gain for translations would be in making translation a menu option or a key-combination away in the program to be translated.

If you consider the translations to be hash-like lookups in a program (obviously, it’s not as simple as that, but it will make it easier to explain this idea), perhaps it could be implementedin a sqlite database or like a perl hash. Once this is in the program, the data may be manipulated from inside the program. The problem is How. Most of the free software is written using a UI library. Since translations are nearly always needed for interactive style applications, they could be made to automatically support inline translations if the UI library makes this function available by default (or without effort).

So assuming this is not only possible, but implemented in e.g. Qt, how would this work?

Users of the software who are either using a program in English and want to translate it from scratch or users who are using it in their own (non-English) language and want to improve the translation can for example click with the right mousebutton combined with ctrl-shift (or whatever) and pop into a translation menu. After the translation is finished, the new translation appears in the UI and there is a menu item in the program to either go into full-text translation (like poedit, virtaal, etc) and an item to submit the users translation to upstream (using a well defined format).
These submissions can be processed like any patches and are e-mailed/notified to the current translators for that language.

It would be useful to have centralised translation aids, like libraries of translated strings, a suggestion from google translate (or similar), but these are optional extras to the inline translation and submission architecture.

In short:

It’s essential to let users translate from within the program, because that is what they have in front of them.

It’s essential to show the translation immediately in the program, so it’s effect is visible. (When scratching an itch, it’s very satisfying to have the itch go away immediately, instead of months from now!)

It’s very important to make it trivial to share the translations upstream and have it go through reviews. (In fact, this could be integrated as well inline, the translation interface could ask upstream to give the most popular translation submissions from others, which you could also pick and optionally vote for)

This is the kind of system that I would use to convince people without programming background of the merits of free software and visualise what contributing upstream looks like. It would make a huge difference to the abstract (to most people) idea that free software can be under your own control, but only if you can program in C/C++/perl/C#/etc.

I would use this myself, whereas I wouldn’t easily go to a website to translate for a particular program in 95% of the cases.

Cheers

/Simon
jstaniek 2015-02-02 at 10:09

Thanks. “I can’t use xyz” may be the loss aversion, hard to solve:
http://ignorethecode.net/blog/2015/01/31/windows_10_re_crappifying_windows_8/
stantontas 2015-02-02 at 15:34

The paper that you cite as “Reina et. al” should, I think be “Arjona Reina et. al”, per the Spanish way of first and second surnames.

One more reason why localisation is hard? :-)

That aside, I agree that it’s a key area that needs focus. As a non-programmer looking for ways to help free software, it’s something that I *could* do, but have never overcome even the low barriers to begin.
Nemo Post author2015-02-02 at 18:42

Thanks stantontas, fixed. I’m Italian and I do know of the Spanish double surnames, yet I spent more time thinking how many authors to quote and I forgot the most important part i.e. to mention their names correctly. ;)
Purodha Blissenbach 2015-02-03 at 08:16

Simon describes a workflow which I would like to have, too. Translating Mediawiki in Translatewiki.net is rewarding and yields good qualty when you translate those string that you see at once. Call it our goal number one.

Yet I also have a very different workflow. Give me a text file with all the strings, as little clutter as possible, and let me type translations. When done, or having reached a milestone, let me submit the file. No additional clicks, no save-button, no waiting for AJAX scripts, just plain text. This is a quick way to get volume translations done, and I tend to want it as a kick-off for new projects at least. Let us regard the KISS (keep it safe+simple) plain-text approach another goal.

People may be translating anyways, some do so professionally or semi-professionally, and they have their tools for it. Computer aided translation is certainly a market of the future and there are many different tools around as there are different personalities using them. There are various interfaces, collections of data, and exchange formats coming with them. Can we, building on the library approach, support some or many of them? I believe so.
Michael 2015-02-07 at 23:53

There are some caveats, though I agree with a lot of the above, including comments.
First and foremost, any QA system (such as voting or reviewing) must not be categorical i.e. it must allow for exceptions. Take our locale, Gaelic (gd), 60,000 speakers, two (very) active localizers. We work together closely though usually on different projects as it’s more efficient. There’s just no way any voting system would ever work for our locale, no translation would *ever* see the light of day.
The second thing I have discovered is that translation and crowdsourcing en-masse don’t mix that well. It really is a case of ‘many cooks spoil the broth’ and funnily enough, the fact that there are only 2 of us means that what there is, is actually very consistent in terms of style and terminology unlike some (for example) German translations I’ve seen. Not that it was necessarily bad, but it was obvious that there had been more than one translator. Instead of focussing on quality by numbers, we should consider using smaller teams or at least smaller, highly active teams to review translations done en-masse. Not really an issue for locales like Gaelic, but some of the mid-range languages could benefit from something like this.
Chusslove Illich 2015-02-08 at 00:26

From Federico’s post I learned of Burke’s early encounter with Gettext, and from Ioannis’ comment of the OpenTTD’s custom translation system with plural-case-gender handling. These were very interesting datums, thanks!
Beluga 2015-02-10 at 11:26

“Give me a text file with all the strings, as little clutter as possible, and let me type translations. When done, or having reached a milestone, let me submit the file.”

Transifex has this feature at least.
Nemo Post author2015-02-11 at 20:01

Does someone not have such a feature? Translate has it, too. And you don’t even have to worry whether you completed the file or whether new strings were added in the meanwhile. https://www.mediawiki.org/wiki/Help:Extension:Translate/Off-line_translation
Turker Sezer 2015-02-15 at 02:25

Hello,

I am also working on a free translation memory service.

You can check http://lictionary.in/ as a publicly accessible free translation memory. Now it has 9.1 million translations for 435k unique strings in 129 different languages.
Pingback: Prioritizing MediaWiki’s translation strings « It rains like a saavi
Pingback: What are the most important things the software freedom movement can do for itself?

Comments are closed.