The planned update of MediaWiki as the underlying software which forms the basis of WMF wikis to version 1.17 failed last week (Wikimedia Techblog). The original deployment was expected to begin 07:00 UTC on February 8 (see previous Signpost coverage), but preparations took longer than anticipated and actual deployment began at around 13:00 UTC.
Several issues became apparent almost immediately. The parser cache miss rate almost doubled with the new deployment, at which point the Apache servers, which are responsible for delivering content to users, became overloaded and started behaving unpredictably. The increased load culminated with multiple issues across the project from increased lag to even outage for some users. At this point, the deployment was rolled back to the previous 1.16 release. The tech team investigated and prepared for another attempt after resolving some technical issues. A second attempt was made at 16:27 UTC, but this ran into similar performance issues and had to be called off 90 minutes later. Further attempts were put on hold.
Danese Cooper, Wikimedia's Chief Technical Officer, blogged about the failed deployment and explained what the Foundation had attempted to deploy:
“ | The 1.17 release process has been longer than we would have liked, which has meant more code to review, and more likelihood for accumulating a critical mass of problems that would cause us to abort a deployment.... [it] was an omnibus collection of fixes, including a large number of patches which had been waiting for review for a long time. The Foundation’s big contribution to the release was the ResourceLoader, a piece of MediaWiki infrastructure that allows for on-demand loading of JavaScript. Many other incremental improvements were made in how MediaWiki parses and caches pages and page fragments. | ” |
After further investigation and several fixes to the release, Rob Lanphier, a developer with the WMF, added that "some of the unsolved issues are complicated enough that the only timely and reasonable way to investigate them is to deploy and react". As a result of this, he said, a new plan had been drawn up in which 1.17 will be deployed on "just a few wikis at a time". The tech team believes the problem was located in the configuration of the $wgCacheEpoch variable, which caused a more aggressive culling of the cache than the servers could handle (Wikimedia Techblog).
The team decided on a two-stage deployment for their next attempt (reviving some old code for project-wise upgrading). The first phase took place 6:00–12:00 UTC on Friday, February 11. This was limited to the Simple English Wikipedia and Wiktionary; the Usability and Strategy Wikis; Meta; the Hebrew Wikisource; the English Wikiquote, Wikinews and Wikibooks; the Beta Wikiversity; and the Esperanto and Dutch Wikipedias.
At the time of writing, the deployment had been completed on all but the last two projects. The Hebrew Wikisource, included after a request from a community member, gave a chance to observe the deployment on a right-to-left language wiki. The team also reported some localization issues which triggered ParserFunction bugs on both nl.wikipedia.org and eo.wikipedia.org. The traffic from nl.wikipedia.org was enough at the time to cause a noticeable spike in CPU usage on the web servers, including some time-out errors; thus, deployment onto nl.wikipedia.org had to be delayed. After these issues are resolved , the second wave of deployment is expected to start on Wednesday, February 16 (see the current list of WMF wikis that are already running 1.17).
An IRC office hour Q&A was held on matters related to the ResourceLoader, which is expected to cause compatibility issues with some existing Javascript code. Trevor Parscal and Roan Kattouw, the main developers of the ResourceLoader, were available on IRC on February 14 at 18:00 (UTC) to answer queries related to the new feature.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
Discuss this story