Do you want your text plain or with furries

Bug 8521 was filed today. It is about an old problem: text escaping in MediaWiki. We dump some of our user interface strings (messages) without escaping to html output. Some of the messages are parsed as wikitext, rest are escaped and outputted as plain text.

Debugging view

There are various reasons why we try to get rid of the unescaped output. One of reasons is that any sysop can edit those messages to contain anything ranging from invalid html to some code exploiting some security holes in the browser. Latter isn’t that much of concern because sysops should be trusted enough not to do evil things, and there is always common.js where they can do it anyway. Invalid html, however, is bad. If we ever get to the point where we output xhtml or xml, the code must be 100% valid, or the site doesn’t work at all. We don’t want sysop to break whole site accidentally with no easy way to repair it.

Unfortunately fixing unescaped output isn’t that straightforward. Two main problems are backward compatibility and performance. Many people are actively using complex html-markup in some messages and consider it as a feature. I think we removed the worst one—we provided [[mediawiki:edittools]] to use instead of [[mediawiki:summary]], latter still unescaped. But there are many left. And from time to time when we “demote” message from html to wikitext or plain text where someone uses html, the user goes nuts and breeds squad of terrorists. And if we add a wikitext message, which is parsed by the parser on every page view, one of our fellow developers commits a suicide, which we don’t want to happen :). Actually, there is third type of messages (not counting unescaped), where we only parse brace-thingies, like magic words. Now imagine you being a translator or a sysop customizing some UI message. Which format should you use in that particular messages that you are currently modifying? I don’t have answer to that question, sorry.

It’s been a while since I last wrote longer pieces of English, if you didn’t notice already :)

-- .