gettext | It rains like a saavi

Today, for a special occasion, I’m hosting this guest post by Federico Leva, dealing with some frequent topics of my blog.

A special GNU committee has invited everyone to comment on the selection of high priority free software projects (thanks M.L. for spreading the word).

In my limited understanding from looking every now and then in the past few years, the list so far has focused on “flagship” projects which are perceived to the biggest opportunities, or roadblocks to remove, for the goal of having people only use free/libre/open source software.

A “positive” item is one which makes people want to embrace GNU/Linux and free software in order to user it: «I want to use Octave because it’s more efficient». A “negative” item is an obstacle to free software adoption, which we want removed: «I can’t use GNU/Linux because I need AutoCAD for work».

We want to propose something different: a cross-fuctional project, which will benefit no specific piece of software, but rather all of them. We believe that the key for success of each and all the free software projects is going to be internationalization and localization. No i18n can help if the product is bad: here we assume that the idea of the product is sound and that we are able to scale its development, but we “just” need more users, more work.

What we believe

If free software is about giving control to the user, we believe it must also be about giving control of the language to its speakers. Proper localisation of a software can only be done by people with a particular interest and competence in it, ideally language natives who use the software.

It’s clear that there is little overlap between this group and developers; if nothing else, because most free software projects have at most a handful developers: all together, they can only know a fraction of the world’s languages. Translation is not, and can’t be, a subset of programming. A GNOME dataset showed a strong specialisation of documenters, coders and i18n contributors.

We believe that the only way to put them in control is to translate the wiki way: easily, the only requirement being language competency; with no or very low barriers on access; using translations immediately in the software; correcting after the fact thanks to their usage, not with pre-publishing gatekeeping.

Translation should not be a labyrinth

In most projects, the i18n process is hard to join and incomprehensible, if explained at all. GNOME has a nice description of their workflow, which however is a perfect example of what the wiki way is not.

A logical consequence of the wiki way is that not all translators will know the software like their pockets. Hence, to translate correctly, translators need message documentation straight in their translation interface (context, possible values of parameters, grammatical role of words, …): we consider this a non-negotiable feature of any system chosen. Various research agrees.

Ok, but why care?

I18n is a recipe for success

First. Developers and experienced users are often affected by the software localisation paradox, which means they only use software in English and will never care about l10n even if they are in the best position to help it. At this point, they are doomed; but the computer users of the future, e.g. students, are not. New users may start using free software simply because of not knowing English and/or because it’s gratis and used by their school; then they will keep using it.

With words we don’t like much, we could say: if we conquer some currently marginal markets, e.g. people under a certain age or several countries, we can then have a sufficient critical mass to expand to the main market of a product.

Research is very lacking on this aspect: there was quite some research on predicting viability of FLOSS projects, but almost nothing on their i18n/l10n and even less on predicting their success compared to proprietary competitors, let alone on the two combined. However, an analysis of SourceForge data from 2009 showed that there is a strong correlation between high SourceForge rank and having translators (table 5): for successful software, translation is the “most important” work after coding and project management, together with documentation and testing.

Second. Even though translation must not be like programming, translation is a way to introduce more people in the contributor base of each piece of software. Eventually, if they become more involved, translators will get in touch with the developers and/or the code, and potentially contribute there as well. In addition to this practical advantage, there’s also a political one: having one or two orders of magnitude more contributors of free software, worldwide, gives our ideas and actions a much stronger base.

Practically speaking, every package should be i18n-ready from the beginning (the investment pays back immediately) and its “Tools”/”Help” menu, or similarly visible interface element, should include a link to a website where everyone can join its translation. If the user’s locale is not available, the software should actively encourage joining translation.

Arjona Reina et al. 2013, based on the observation of 41 free software projects and 22 translation tools, actually claim that recruiting, informing and rewarding the translators is the most important factor for success of l10n, or even the only really important one.

Exton, Wasala et al. also suggest to receive in situ translations in a “crowdsourcing” or “micro-crowdsourcing” limbo, which we find superseded by a wiki. In fact, they end up requiring a “reviewing mechanism such as observed in the Wikipedia community” anyway, in addition to a voting system. Better keep it simple and use a wiki in the first place.

Third. Extensive language support can be a clear demonstration of the power of free software. Unicode CLDR is an effort we share with companies like Microsoft or Apple, yet no proprietary software in the world can support 350 languages like MediaWiki. We should be able to say this of free software in general, and have the motives to use free software include i18n/l10n.

Research agrees that free software is more favourable for multilingualism because compared to proprietary software translation is more efficient, autonomous and web-based (Flórez & Alcina, 2011; citing Mas 2003, Bowker et al. 2008).

The obstacle here is linguistic colonialism, namely the self-disrespect billions of humans have for their own language. Language rights are often neglected and «some languages dominate» the web (UNO report A/HRC/22/49, §84); but many don’t even try to use their own language even where they could. The solution can’t be exclusively technical.

Fourth. Quality. Proprietary software we see in the wild has terrible translations (for example Google, Facebook, Twitter). They usually use very complex i18n systems or they give up on quality and use vote-based statistical approximation of quality; but the results are generally bad. A striking example is Android, which is “open source” but whose translation is closed as in all Google software, with terrible results.

How to reach quality? There can’t be an authoritative source for what’s the best translation of every single software string: the wiki way is the only way to reach the best quality; by gradual approximation, collaboratively. Free software can be more efficient and have a great advantage here.

Indeed, quality of available free software tools for translation is not a weakness compared to proprietary tools, according to the same Flórez & Alcina, 2011: «Although many agencies and clients require translators to use specific proprietary tools, free programmes make it possible to achieve similar results».

We are not there yet

Many have the tendency to think they have “solved” i18n. The internet is full of companies selling i18n/10n services as if they had found the panacea. The reality is, most software is not localised at all, or is localised in very few languages, or has terrible translations. Explaining the reasons is not the purpose of this post; we have discussed or will discuss the details elsewhere. Some perspectives:

Gettext is powerful but problematic (cf. the Gettext Localisation horror story, 1999).
Mozilla i18n and L20n has an unclear direction and a tendency to turn localisation into programming.
We know little of most proprietary software.
In the translatewiki.net intro, Translating the wiki way (video) and Localisation for developers (doc) we try to explain what matters for us and how we do things at translatewiki.net.

A 2000 survey confirms that education about i18n is most needed: «There is a curious “localisation paradox”: while customising content for multiple linguistic and cultural market conditions is a valuable business strategy, localisation is not as widely practised as one would expect. One reason for this is lack of understanding of both the value and the procedures for localisation.»

Can we win this battle?

We believe it’s possible. What above can look too abstract, but it’s intentionally so. Figuring out the solution is not something we can do in this document, because making i18n our general strength is a difficult project: that’s why we argue it needs to be in the high priority projects list.

The initial phase will probably be one of research and understanding. As shown above, we have opinions everywhere, but too little scientific evidence on what really works: this must change. Where evidence is available, it should be known more than it currently is: a lot of education on i18n is needed. Sharing and producing knowledge also implies discussion, which helps the next step.

The second phase could come with a medium term concrete goal: for instance, it could be decided that within a couple years at least a certain percentage of GNU software projects should (also) offer a modern, web-based, computer-assisted translation tool with low barriers on access etc., compatible with the principles above. Requirements will be shaped by the first phase (including the need to accommodate existing workflows, of course).

This would probably require setting up a new translation platform (or giving new life to an existing one), because current “bigs” are either insufficiently maintained (Pootle and Launchpad) or proprietary. Hopefully, this platform would embed multiple perspectives and needs of projects way beyond GNU, and much more un-i18n’d free software would gravitate here as well.

A third (or fourth) phase would be about exploring the uncharted territory with which we share so little, like the formats, methods and CAT tools existing out there for translation of proprietary software and of things other than software. The whole translation world (millions of translators?) deserves free software. For this, a way broader alliance will be needed, probably with university courses and others, like the authors of Free/Open-Source Software for the Translation Classroom: A Catalogue of Available Tools and tuxtrans.

“What are you doing?”

Fair question. This proposal is not all talk. We are doing our best, with the tools we know. One of the challenges, as Wasala et al. say, is having a shared translation memory to make free software translation more efficient: so, we are building one. InTense is our new showcase of free software l10n and uses existing translations to offer an open translation memory to everyone; we believe we can eventually include practically all free software in the world.

For now, we have added a few dozens GNU projects and others, with 55 thousands strings and about 400 thousands translations. See also the translation interface for some examples.

If translatewiki.net is asked to do its part, we are certainly available. MediaWiki has the potential to scale incredibly, after all: see Wikipedia. In a future, a wiki like InTense could be switched from read-only to read/write and become a über-translatewiki.net, translating thousands of projects.

But that’s not necessarily what we’re advocating for: what matter is the result, how much more well-localised software we get. In fact, MediaWiki gave birth to thousands of wikis; and its success is also in its principles being adopted by others, see e.g. the huge StackExchange family (whose Q&A are wikis and use a free license, though more individual-centred).

Maybe the solution will come with hundreds or thousands separate installs of one or a handful software platforms. Maybe the solution will not be to “translate the wiki way”, but a similar and different concept, which still puts the localisation in the hands of users, giving them real freedom.

What do you think? Tell us in the comments.

It rains like a saavi

About me, me and me

Tag Archives: gettext

GNU i18n for high priority projects list

What we believe

I18n is a recipe for success

We are not there yet

Can we win this battle?

“What are you doing?”

Review of Gettext po(t) file format