GSoC wrap-up – Translate extension

GSoC is almost over now. Lots of cool things have happened, but unfortunately you may not be aware of it, because I have neglected to blog about it.  That is definitely a regression compared to last year and something to keep in mind in the future. I managed to do almost all tasks from the project plan with priority higher than 4, with some rough edges there and here. Next I will pick some highlights from the completed tasks.

Improved usability

This year there were many usability related issues to improve the translation work flow. The improvements done by Wikimedia Usability project nicely complements my work for the benefit our less technically oriented audience. Most important improvement is probably the buzzword compatible ajax-editing. No more do translators need to open new browser tab for each message they want to edit, but instead they get floating dialog inside the current page (implemented using jQuery dialog). This means they never need to leave the list of messages any more, but it stays always in the background. It also makes easier to do quick edits to message documentation or other languages, because you just get a new dialog, and once you finished editing it, you are back in the previous message.

Ajax edit interface

Ajax edit interface

Other features to include user preference to choose additional languages to show when translating. The feature itself is not new, but now users can customise the list of languages.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

Languages can be selected from the dropdown and using the button or typing in the language codes directly.

To date we not really taken advantage of achievements in language technology. Now we have taken the first steps towards it by implementing a simple translation memory. It is a very simple setup, where we use tmserver from Translate Toolkit and fill it from time to time with existing translations from translatewiki.net. Tmserver uses well known Levenshtein algorithm to give suggestions. It isn’t very good, nor anything compared to state of the art systems, but the suggestions have already been useful, as told us by the translators itself. There is many ways to improve the suggestions from better algorithms to using larger set of translations as source data and preprocessing the source data (text alignment, case and punctuation normalisation). I’m looking forward to them.

Other changes

There were many improvements to the lesser used features. Special features (magic names, special page alias, namespaces) can now be exported using a script. No more time wasted in copy-pasting. In addition it is now possible to localise magic words for extensions. It is up to translation teams to decide, whether they want to do this understandably controversial thing.

In message checks there were at times false positives which caused confusion among translators. Now there is flexible system to suppress those warnings.

Gettext-style plurals are now supported better, but no one of our Gettext projects is currently using those yet. Related, there is now a special page to import offline translations. We can now give trusted translators or users the permission to import offline translations, delegating them away from server admins. It supports download-from-url, files uploaded to the wiki and local file uploads.

The offline importer actually uses the same engine that I developed for another feature: web based message group management. It is now possible for project admins to import external changes, fuzzy other changes if necessary using their browser. It is much easier than doing those steps manually on the command line, but there is still some practical problems to solve. One major piece still missing is integration with version control systems, so command line access is still needed to do svn up or similar for other systems. It is somewhat related to the other problem, which is limited execution time for web requests. It is currently wise enough to check after every action if we are near the limit, and stop further processing and give the user ability to continue from that point. We can’t increase the execution time limitlessly, but there might be hope for example by doing multiple requests with ajax to spare the user from clicking continue button many times.

The future

There is always something to be done or something that can be improved. I will target on improving the new web interface and group management, which is still quite immature. Ajax-editing works, but is still missing the cool-factor without proper polishing. And like that isn’t enough, Siebrand has collected wish list for me. I will try my best to fulfil each request with my time which is limited especially now that study year starts again.

It will be interesting to see where we are next year. We are not alone any more and while other platforms are developing I want to keep translatewiki.net special – to give a face to internationalisation and localisation instead of being just a dumping ground for translations.

GSoC status report – Translate extension

Last year I participated in Summer code Finland. During that I added many new features to Translate extension, to allow biggest user of the extension, translatewiki.net grow bigger. And now translatewiki.net is indeed bigger. This year the project plan contains many tasks, which aim to make the using experience more pleasant both the translators and the project admins. In addition there is a pile of bug fixes and i18n improvements to MediaWiki. I will tell more about those features when I finish them.

The first coding week is now in the past. The big task for that week turned out to be more difficult than estimated. It was about making certain things faster, mostly regarding to generating translation statistics. The cause for the slowness was fuzzy messages, which are messages which have translation but the translation needs updating or reviewing. Information about fuzziness was stored as text string in the message content itself.  Now it is mirrored to another table, where it can be queried without loading the translation and checking the existence of the fuzzy string. Thanks to everyone who helped with that.

Fortunately I managed to do some other tasks too. Siebrand is likely to be happy, that he can export translation of MediaWiki’s namespaces, magic words and special page aliases with one command on command line. That is, instead of using web browser and requesting an export of one of those features for each language individually and pasting them to the translation files. Should save some precious time for better use.

Stay tuned for the next status report! It may take a week or two, as I am planning a little holiday trip to Sottunga in Åland and I don’t expect to be connected very often.

Vaikeat asiat helposti – vai miten se oli

Tämä postaus sisältää avautumista SELinuxista ja salasanoista, kiinnostumattomat ohittakoon.

Sainpa tällä viikolla parin viivytyksen jälkeen uuden kannettavan. Tällä kertaa juuri tätä sanaa kehtaa koneesta käyttää, se on nimittäin Thinkpad X200s, jota voisi kuvailla sanoin laajakuva, kevyt ja akkukin kestää. Edelliselle koneelle kävikin vähän huonosti.

Aika onkin tässä kulunut hyvin sijaistekemiseen kuten saksan kertaukseen saksankielisen käyttöjärjestelmän avulla ja Linuxin asentamiseen. Distroksi tuli tällä kertaa rakastettu ja vihattu Fedora numero 10. Melkein kaikki toimii, mutta jotain viilattavaa vielä on. Kirkkaudensäätönapit toimivat, paitsi ne eivät muuta kirkkautta. Siihen on onneksi helppo kiertoratkaisu. Sormenjälkilukijalle ei vielä kuulemma ole ajureita, eikä kyllä oikein kiinnostakaan. Jotain vikaa Guidancessa on kun se haluaa pistää konetta nukkumaan vähän väliä, eikä sen ”napsauta tästä, jos et halua” ironisesti tietystikään toimi.

Niin, mikä sitten on niin vaikeata? No tulostaminen. Käytössä oleva lasertulostin pitäisi toimia foo2qpdl-ajurilla, mutta heidän ”kaikki distrot ei osaa” -asenne ja käsin pakettien asentaminen ei kirjoittajaa kiihota. Onneksi vaihtoehtona onkin Samsungin suljetut ajurit, jotka sää kätevästi netistä. Asentaminen on helppoa ja tulostinkin on valmiiksi paikallaan. Eipä sitten kuin tulostamaan! Tai sitten ei – jälki on sen näköistä ettei tulosteista saa mitään selvää. Siinäpä sitten ihmetellään puoli iltaa miksei se nyt vaan toimi. Lopulta syyksi paljastuu SELinux. Sitten koitetaan toiset puolet illasta korjata tätä ”turvasysteemiä”. Arvaatteko mikä oli lopputulos? Kyllä, **kankkulan kaivoonhan se lähti.

Fedorassa tulee ajatukseltaan aivan loistava apuohjelma, joka ilmoittaa SELinuxin estämistä tapahtumista. Näppärää, eikö. Sen lisäksi, että tämä ihana toiminnallisuus vaatii suorittimen herättämistä monta kertaa sekunnissa, se ei ole käyttökelpoinen. Virheilmoitukset ovat hepreaa (tässä tapauksessa tietoturvanörttihippienglantia), joista ainakaan minä en tajua juuri mitään, saati sitten ns. tavallinen käyttäjä. Ohjelma tarjoaa ”korjauskomentoa”, mutta jättää kertomatta missä hakemistossa se pitää ajaa. On myös mahdollista pyytää ohjelmaa suorittamaan se itse, mutta se toiminto onkin valikoissa harmaalla – miksi? No pienen tutkimustyön jälkeen selviää, että ohjelman asetustiedostosta pitää määritellä käyttäjät, jotka saavat näitä ajaa. Näin määritellyt käyttäjät saavat ajaa näitä komentoja pääkäyttäjän oikeuksin. Näppärää, eikö? Ehkä viime vuosisadalla, nykyään pitäisi policykittien ja minkä lie taustaprosessien mahdollistaa oikeiden pyytäminen ilman tämmöistä säätöä. Ainiin, ja se ei muuten toimi! Lisäksi asetustiedoston uumenissa on asetus, saako noita komentoja ylipäätänsä ajaa. No muutetaanpas sekin ja käynnistetään uudelleen, kun tätä hienoa ohjelmaa ei tunnu muuten saavan käynnistettyä uudelleen. No eipä toimi vieläkään, ja lopputuloksen jo kerroin.

Monen monta kertaa kun tietokoneen päivän aikana joutuu käynnistämään, niin palaa hermo jos kullakin, kun pitää niitä salasanoja kirjoitella. Mutta kyllähän tekniikan kehitys on tietysti auttanut tässäkin ongelmassa ja tehnyt siitä helpompaa. No ei ainakaan minulla kohdalla – joudun kirjoittamaan kuusi sala- tai tunnussanaa aina kun käynnistän koneeni.

  1. Salattujen osioiden avaaminen
  2. Sisäänkirjautuminen
  3. Gnomen lompakkopalvelu NetworkManageria varten
  4. Ssh-avaimen avaus agentille
  5. Firefoxin ns. pääsalasana
  6. Kde:n lompakkopalvelu Akregatorin syötteitä varten

Edellisellä koneellani tilanne oli melkein yhtä paha, kohdat 1 ja 3 puuttuivat. Lisäksi voisi kuvitella joillain käyttäjillä olevan vielä erikseen Biossissa salasana. Lisäksi melkein kaikki salasanankyselijät keskeyttävät muun työskentelyn, ainakin kyseisessä ohjelmassa, kunnes salasana on annettu. Firefox on vielä tässä suhteessa vallan mainio, koska se avaa niin monta dialogia kun se haluaa salasanoja. Toisin sanoen pitää odottaa, että kaikki session sivut ovat latautuneet, näpyttää salasana kiltisti siihen uusimaan dialogiin ja näpytellä muut pois enterillä.

Pyyntönti kuuluukin: I can has integration KTHXBYE?

Drawing i18ned text in images.

A picture is worth a thousand words, but drawing a word can be harder than one expects.

Usually it is a good idea to avoid text in images for multiple reasons. Foremost, images make localisation hard. It requires tools, some skill in image manipulation and handwork. Another benefit is the need to store only one copy of the image.

In some cases it is unavoidable to use text in images. In other cases… it is just used for lesser reasons. In this post I will not talk about layout issues, like limited space and inflexibility in image size. In Betawiki we have hundreds of languages, of which many of them are using poorly supported scripts.

PHP GD library provides two methods to draw text. imagestring can be used only to draw text in latin-2, so we can forget it immediately. The other one is imagettftext, which since PHP 5.2.0 allows to use UTF-8. Great, now we can pass all translations we have to it. The next problem is choosing a suitable font, since imagettftext specifically needs path to one in its parameters. As we know, there is no font to cover all scrips, and too many fonts manually map language codes to them and require everyone using the code to install just those fonts.

The only way to automatically choose a proper font for a language (script) code is fontconfig. I have written a wrapper, which calls command line utilities of fontconfig to fetch the most suitable font. This does not solve the missing font problem, but if there is a suitable font in the system and fontconfig knows about it, it will be used. And yet, there is still problems like wrong rotation for Japanese.

The big question: is there any better way to do this?

Page translation + documenting = translated documentation???

Not yet at least. I was sick for few days and actually worked mostly on page translations this week to get it working. But I also wrote some more documentation, but it is not yet published. The wiki page translation should now work with some caveats, and it doesn’t yet have all the features I wanted. See a very simple example here.

It can now display the languages and how complete and up-to-date they the translations are approximately. Suitable translation is not yet automatically selected for the user, but at least the user can now see which languages are available and view them, as opposed to the previous version.

This projects ends in a week. It has been very nice, and I still hope I can recover a little from the problems encountered in this task. Let’s hope the summer doesn’t end this week also, even though I already have done my schedule for next study year in the university.

Status update

This update is somewhat delayed, a bit too much even in my opinion. There has been some problems with the wikipage translation design I started with, like broken and complicated caching. I’m now trying a different approach but I’ve already spend more time on this than the two weeks I have allocated for it. Tomorrow I have an exam, but I’ve planned to spend the rest of the week to try to get something usable out.

After that I change to the other items, two way changes and documentation. If I can finish them quickly I could resume working on the wikipage translation if needed. In any case it looks like that I don’t have time to work on the optional features.

New week – new task

The time for stats ended a little over week ago week. I added little new features, like per hour granularity and counting of active translators instead of edits and there is now a simple GUI to generate a code that can be included in pages for those who don’t care to remember the parameters.

My current task is wikipage translation. I have been waiting for this task and I’m very excuisited to see what will come from it. The basic system should already work and it is being tested on an example page. That means it is possible to mark content for translation, translate it, and changes to the content will invalidate the translation. But as you can see it is still missing a lot, like for example selection for language.

Status update: Statistics etc.

My progress on implementing nice statistics has been an on-off trip. Both MediaWiki and FreeCol are going to make releases soon. And then there is all kinds of bugs here and there I feel obligated to fix. During the weekend I managed to fix a very bad memory leak where one of our scripts was using all our memory from the server, compared to quite stable 30M after the fix. I really want to thank milian from #geshi for the help using xdebug and his nice tools to identify the cause.

Gettext and Xliff: Nothing much here. Still haven’t tested msgmerge, so it is to be seen how well it works.

Other features: Special page alias translation got a really big boom. Suddenly the number of supported extensions has grown to 23, and we have already “produced” hundreds of translations in many languages. Message formatting checks got little improvements, and now that the leak is fixed, we can update those regularly too.

So let’s go ahead to the stuff I was meant to do: Stats. Thanks to a friend who suggested using PHPlot, I have managed to make pretty good progress on this anyway with all the other stuff going on. I think I’m going to explain my progress by using few examples and eye candy. Click the images to show full size versions if they are scaled.

First we have a graph of showing the number of translation edits per day in Betawiki.

All translation edits in MediaWiki

It is also possible to compare projects:

Edits to MediaWiki and FreeCol compared

And then we have graphs in our portals:

Finnish translation edits

Or if you want to compare how your worst (best?) rival is doing much better than your language:

Comparison of Finnish and Swedish activity

Or do it only for one project:

Comparison of Finnish and Swedish activity for mobile broadband configuration assistant

We also have graphs in our project pages.

As you can see, the labels could use some polishing. There is no GUI for generating these, but it is easy if one knows the configuration parameters. It is possible to include them in pages with the special page inclusion syntax: {{Special:TranslationStats/language=xx;days=nn;group=id}} The size can also be changed with width and height parameters.

Every graph is visually about the same. I kind of like it, but YMMV. If this feature turns out to be very popular, I have to figure out how to do more aggressive caching. The data is is fetched from Betawiki recent changes table. It means that external changes are not counted—one more reason to use Betawiki.

Hei hei Amarok

Amarok on ollut lempisoittimeni amaroK-ajoista lähtien. Nyt se on kuitenkin ohi. Syy tähän ei ole Postgre-kantojen tuen pudottaminen tai kakkosen laajakuvanäytöille suunniteltu ulkoasu. Syy on ettei Amarok suoriudu olennaisesta tehtävästään – eli musiikin soittamisesta – ilman että minulla palaa hihat ja tekisi mieli heitellä asioita.

Syy numero yksi: etäkokoelmien soittaminen daapilla on epävakaata, ja kappaleista leikkaantuu välillä paljonkin lopusta pois. Syy numero kaksi: etäkokoelmien soittaminen sshfs:llä lähiverkossakin on uskomattoman vaikeata. Kun kappale vaihtuu automaattisesti, soittaa Amarok muutaman sekunnin ja hyppää sitten heti seuraavaan. Käsin kappaleen vaihtaminen voi kestää kymmeniä sekunteja.

Ja syy numero kolme sai minut lopulta luopumaan siitä. Jos Amarok on ns. jumissa jauhamassa jotain juttujansa mitä se usein tekee, sen käyttäminen esimerkiksi globaaleilla pikanäppäimillä saa koko KDE:n olemaan reagoimatta näppäinsyötteeseen. Uskomatonta, eikö olekin? Ja tämä bugi on ollut jo iät ja ajat. Mikä pahinta, myös hiiri on mahdollista saada jumiin, siten että painikkeen painaminen tuottaa vain ikkunansiirtokursorin. Mahtavaa! Siinä sitten vaihtoehtoina on 1) tapa X 2) tapa kone, eli toisin sanoen menetän istuntoni vaikka en osallistuisi arvontaan. Joskus jumitus menee ohi hetken päästä, joskus ei.

Jään varmasti kaipaamaan Amarokin mahtavaa soittolistojenhallintaa ja hyvää suomennosta (itse kun olen siihen paljon aikaa uhrannut). Välillä on kuitenkin aika siirtyä eteenpäin. Siinä missä missä toiset vaihtavat distroja useammin kuin kravatteja, olen minä alusta saakka ja edelleenkin Gentoon käyttäjä. Ja kyllä vielä toistaiseksi pysynkin vaikka ei sillä nyt loistavasti mene.

Ehdotuksia vaihtoehtoisista soittimista otetaan vastaan.

Localisation of images

Amidst of fixing bugs I remembered a old feature request for localising images. One image may be worth of thousand words, but what if those words are in a foreign language? Now it is possible to replace anglocentric images in the user interface with localised ones. I use this opportunity to add some images to my pretty boring blog entries :)

So here is the current default toolbar in MediaWiki’s edit view:

Here is the same when using Arabic as the user interface language:

And one more example, which is for Belarusian (Taraškievica orthography):