Translation engines: black boxes | It rains like a saavi

One would hope that using machine translation system would be as easy as giving text and pair of languages in and getting something out. But at least here in translatewiki.net things are pretty complex under the hood.

First of all these translation engines are external systems which are based on huge corpora of translated texts and statistical methods. Translations are queried trough HTTP requests. The Translate extension implements an algorithm which keeps tracks of failures and disables the whole service for some period. Failures can be error messages, time outs or even failures to establish a connection. For example on translatewiki.net recently moved to a new server which has a bit unstable DNS resolution which needs to be fixed.

Disabling serves multiple purposes. First of all if the service is temporarily down, we don’t waste our nor their time trying. Secondly, if we hit some kind of rate limit (we shouldn’t) we can back off for a while.

Then there is a issue with the contents–the engines like to mungle mangle mingle all things they don’t understand. In interface translation with many special characters and expressions this is annoying. I just recently made some improvements here based on a suggestion from Jeroen De Dauw. The most common special syntaxes are now armored against changes. This includes variables like $1, %s or %foo% and some other things. Line breaks disappear too, but that was already worked around earlier.

-- Niklas Laxström.