Monthly Archives: March 2011 is happy

Many of the issues that have been annoying us all in have been fixed lately. To show my appreciation on behalf of I’d like to highlight these fixes.

Issue one: saving messages in talk pages fails. If you just pressed “Save” once, you got an error message about broken session data. The reply was saved only if you clicked “Save” a second time. I don’t know how many messages we lost due to this. I lost a couple because after replying to a thread, I went to do some other things. Many of my replies were delayed, because I didn’t notice immediately that the save failed. What was worse, usually one would have to scroll down the page to even see the error message! I’m very happy that it is fixed now. Many thanks to Andrew Garrett!

Issue two: portions of changes were not shown at all when viewing differences between two versions. Not as annoying as the first item, this was still nasty and confusing us. I submitted a test case for this bug in wikidiff2 extension and fortunately Tim Starling was able to reproduce it. Soon after he committed a fix. Thanks Tim!

Issue three: message groups for projects which store all translations into single file (like Pywikipediabot) were stuck in “has changes” status. This bug only annoyed the project leaders of After some encouragement Robert Leverington came up with a fix and found a serious bug in code which determines if there has been any changes into the messages. The fix affects all message groups. To Robert: good catch and big thanks.

Issue four: Microsoft® Translator, one of the translation services we use to suggest translations for our translators next to Google Translate, Apertium and our own tmserver, is often incorrectly identified to be down. Brian Wolff and Sam Reed have helped to investigate the issue, but it is not yet fully fixed.

Finally many thanks to those who help us to keep running from day to day, you are many. A special thanks goes out to – Webhosting, vServer, Servermanagement who has provided us with their flagship product “vCloud 8000”, which allows us to serve our pages faster than ever before. We need lots of help with challenges that range from coding to writing and design. Don’t hesitate to ask us how you could help us!

Translation engines: black boxes

One would hope that using machine translation system would be as easy as giving text and pair of languages in and getting something out. But at least here in things are pretty complex under the hood.

First of all these translation engines are external systems which are based on huge corpora of translated texts and statistical methods. Translations are queried trough HTTP requests. The Translate extension implements an algorithm which keeps tracks of failures and disables the whole service for some period. Failures can be error messages, time outs or even failures to establish a connection. For example on recently moved to a new server which has a bit unstable DNS resolution which needs to be fixed.

Disabling serves multiple purposes. First of all if the service is temporarily down, we don’t waste our nor their time trying. Secondly, if we hit some kind of rate limit (we shouldn’t) we can back off for a while.

Then there is a issue with the contents–the engines like to mungle mangle mingle all things they don’t understand. In interface translation with many special characters and expressions this is annoying. I just recently made some improvements here based on a suggestion from Jeroen De Dauw. The most common special syntaxes are now armored against changes. This includes variables like $1, %s or %foo% and some other things. Line breaks disappear too, but that was already worked around earlier.

-- .