In the last sprint I developed a translation memory server in PHP almost from scratch. Well, it’s not really a server. It’s run inside MediaWiki during client requests. It closely follows the logic of tmserver from translatetoolkit, which uses Python and SQLite.
The logic of how it works is pretty simple: you store all definitions and translations in a database. Then you can query suggestions for a certain text. We use string length and fulltext search to filter the initial list of candidate messages down. After that we use a text similarity algorithm to rank the suggestions and do the final filtering. The logic is explained in more detail in the Translate extension help.
PHP provides a text matching function, but we (Santhosh) had to implement pure PHP fallback for strings longer than 255 bytes or strings containing anything else than ASCII. The pure PHP version is much slower, although that is offset a little because it’s more efficient when there are fewer characters in a string than bytes. But more importantly, it works correctly even when not handling English text. The faster implementation is used when possible. Before we did some optimizations to the matching process, it was the slowest part. After those optimizations the time is now bound by database access. The functions implement the Levenshtein edit distance algorithm.
End users won’t see much difference. Wanting a translation memory on Wikimedia wikis was the original reason for reimplementing translation memory in PHP, and in the coming sprints we are going to enable it on wikis where Translate is enabled (meta-wiki, mediawiki.org, incubator and wikimania2012 currently). It is just over 300 lines of code  including comments and in addition there are database table definitions .
Now, having explained what was done and why, I can reveal the cool stuff, if you are still reading. There will also be a MediaWiki API module that allows querying the translation memory. There is a simple switch in the configuration to choose whether the memory is public or private. In the future this will allow querying translation memories from other sites, too.