Wikipedia:Wikipedia Signpost/2011-05-16/Technology report

Technology report

Berlin Hackathon; April Engineering Report; brief news

Hackathon unites MediaWiki developers in Berlin

Group photograph of developers and contributors present at Berlin, including paid and volunteer techies

As previewed in last week's Signpost, the now annual Berlin Hackathon was held this week in the German capital, hosted by Wikimedia Deutschland and attended by 100 participants (among them 32 Foundation employees). Predictably, the event received a large amount of digital coverage, including several postings on the Wikimedia Blog and a live video stream from which a large number of recordings were taken (album). It was accompanied by meetings of the Wiki Loves Monuments team, whose preparations for an Europe-wiki photo contest include various software tasks, and of the Language committee.

Many of the topics featured in Friday's "lightning talks" will be familiar to regular readers of the Technology Report: development of a new parser that would make WYSIWYG editing so much easier, for example, and the new Virginia data centre (which will be online from next week). They did, however, cover some new ground, including operational details. Mark Bergsma noted that the two data centres are unlikely to both channel traffic at the same time: instead, one will serve 90% of the traffic while the other will play a secondary, backup role, though the arrangement will be flipped at monthly intervals. WMF developer Neil Kandalgaonkar talked about a recent proof-of-concept hack combining MediaWiki with Etherpad-style realtime collaborative editing in form of the "Hackpad" software (also described on Wikitech-l and in a blog post). There were also talks on the "Kiwix" offline reader (cf. previous Signpost coverage), Firefogg 2, and the Narayam extension among others. In addition to these familiar themes, the line-up also included unfamiliar names, such as PhotoCommons (a new WordPress plugin) and qunit, a JavaScript testing suite that could open up interface testing to the crowd.

Saturday proved just as informative: Trevor Parscal reported the results of trialling left-aligned section edit links (116% more clicks and 9% more edits resulting); a tentative "end of July" release date was set for MediaWiki 1.18, which will bring gender intelligence and the ability to rotate photographs to MediaWiki.

Brion Vibber, who is leading the Foundation's efforts to rewrite the parser, laid down some of his ground rules (including "no new syntax" and a gap between the release of the new parser and its use on the majority of existing pages,), see also mw:Future. According to a tentative timeline, Wikimania in August will see working demos and in November, some beta code being will go live on Wikimedia sites in the form of opt-in gadgets.

There were, however, also heterodox opinions presented. Hannes Dohrn, speaking on behalf of the Sweble Wikitext Parser team, strongly advocated a move away from the familiar "apostrophe syntax" for bold and italic, along with further changes to the wikitext syntax. In a blog post on the same day, the Sweble team made an "offer to the Wikipedia community": "Come up with a new and better Wikitext and use the Sweble Wikitext parser to convert old Wikipedia content to that new format. ... We have spent more than one year full-time working on a parser that can handle the complexities of current Wikitext and it does not make sense to us to create another one."

With one of the WMF's two top product development projects (the parser) discussed on Saturday, on Sunday WMF User Interface Designer Brandon Harris gave a talk on the second, editor retention and improving the "-1 to 100 edit experience". More specifically, he outlined the thinking behind upcoming changes to make the "identity" of contributors more visible, e.g. displaying affiliations, interests and roles, which is hoped to strengthen communities by connecting members with tasks and collaborators corresponding to their interests. He acknowledged that "the big fear is that we are going to turn Wikipedia into Facebook", and promised that this wouldn't happen, but argued that there were things to learn from Facebook in successful community building. Harris predicted that some of the upcoming changes will be controversial, but concluded the talk by advising "it's time to achieve Zen acceptance that we are going to be adding these things, in order to save the projects".

Domas Mituzas, a paid developer for Facebook and a MediaWiki volunteer, spoke on improving performance, especially given the increasingly large number of actions that could seriously slow down larger wikis, e.g. changing widely used templates. A key goal is to reduce the burden of citation processing, which can occupy over 50% of the parsing workload for a given page. Mituzas emphasized the relation between performance and editor retention (sluggish website response times while editing can drive contributors away). Both he and (in the next talk) Tim Starling talked about HipHop integration, with its improvement in performance of up to 400%. It is envisaged to be deployed in late 2011 or early 2012.

Mark Bergsma gave an address (with slides) on Wikimedia's preparations for World IPv6 Day on June 8.

This year's motto was "talk less, code more", and, in addition to the talks, a significant amount of development work ("hacking") was completed: 65 bugs were squashed (see In brief below, for highlights), progress was made on localisation work, and even the start of an attempt at one of the features most requested by Wikimedians, global watchlists. An overhaul of the HTTPS access to Wikimedia sites (outlined last month by Brion Vibber and Ryan Lane) was being worked on.

The three day event also included social elements such as parties and outings. The ongoing success of the Berlin hack-days is almost certain to create interest among other chapters, including the United Kingdom chapter where proposals for a similar event have already been floated.

Further information: Friday blogpost, Friday notes, Saturday blogpost, Saturday notes, Sunday blogpost, Sunday notes, Monday blogpost.

April Engineering Report published

The Foundation's Engineering Report for April was published last week on the Wikimedia Techblog, giving a brief overview of all Foundation-sponsored technical operations in the last month. Given the ten day publication lag this month, the updates provided on many of the major development threads have since been superseded by relevant talks at the Berlin Hackathon (see above). However, the report also gave details on a number of other projects not covered there. For example, it highlighted how work on the mobile projects has "taken off" and the department's involvement in budgeting for the 2011-2012 fiscal year.

Also in the report was news of contractor Russ Nelson's work in improving the media storage and retrieval architecture by deploying new software in the form of "OpenStack Swift", which was deployed on three test servers. In April they started handling a small amount of operations, though the result of this test was not given. The WMF also said that it had received the account details that it needed from Google to start taking up their offer of backup space, helped by a transition to a new, more powerful server from generating the dumps themselves. A new search indexer was installed last month, resolving the space issues experienced with the older server and the Foundation's ability to monitor its uptime was also improved. It wasn't all plain sailing during April, however: according to the report, a click tracking trial with the Article Feedback tool had to be abandoned during the month, as did an upgrade to the Foundation's Squid software (see previous Signpost coverage). Work on keeping the code review backlog down was also unsuccessful, though a specific effort to handle shell bugs proved more effective.

The report added more detail on the new Article Feedback tool, which was recently expanded to cover 100,000 articles on the English Wikipedia, noting that, in a possible evolution of the tool, "users [would] be able to promote particularly relevant reviews to the talk page of the article reviewed" and that the tool would consider the credentials of the reviewer in question. References to a new LiquidThreads timeline were also included with their August release date, along with details of significant work on the project by its sole developer, Andrew Garrett. At the same time, Roan Kattouw and Timo Tijhof started to work on specifications for a version 2 of the ResourceLoader. In addition, following discussions about the deployment of the Interlanguage extension to Wikipedia, designer Brandon Harris made several recommendations to the extension's developers. Also included was news from Wikimedia's commercial partners PediaPress, who have patched the current extension so that generated openZim exports (i.e. articles saved in a format suitable for offline browsing) now have a navigable table of contents. They also report that more than a thousand openZim files were downloaded in just one week of April. In addition, the PoolCounter extension for avoiding repeats of the "Michael Jackson" effect was successfully deployed at the second attempt.

In brief

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.

  • At the Berlin Hackathon, developers decided they knew of no remaining performance concerns with the use of email notifications for user talk pages, allowing its use on larger wikis (bug #5220). Because most editors will find it of use, it was enabled by default on the English Wikipedia and other projects, prompting some confusion when Wikipedia emails were sent to users for the first time.
  • Sergey Chernyshev was replaced by Tom Gries as the maintainer of the OpenID extension for MediaWiki, an extension still considered a possibility for Wikimedia sites if it proves to be scalable and useful (wikitech-l mailing list).
  • With the resolution of bug #28914, bots on the English Wikipedia are now automatically IP block exempt (but not Tor-block exempt).
  • A major flaw in the Titleblacklist which prevents "meaningless" names such as Image.jpg was fixed. It prevented Internet Explorer users from uploading files (bug #28918).
  • Bug #24037 was closed, with a tentative number of bytes added or removed indicator appended to Special:Contributions.
  • The website Embedly, a "platform for converting URLs into embeddable content" now returns thumbnails for images on Wikipedia (example) and Wikimedia Commons (example), making it easier to reuse content.
  • The "Your changes" box seen during edit conflicts is now read-only, preventing users from mistaking it for the box they should be typing in (bug #28287).