A Wikipedia researcher has discovered that the encyclopedia's widely used article traffic statistics are missing out on approximately one-third of total views.
Computer scientist Andrew West has found that mobile readers are not counted by stats.grok.se (an unofficial website linked from the "history" tab on every Wikipedia page) or any other service/report that tabulates and visualizes the Wikimedia Foundation's official raw data. Thanks to a historical artifact, desktop and mobile counts have been segregated since the figures were first released in 2007. "The world has changed a lot since the original code was written," the WMF's director of analytics Toby Negrin told the Signpost. "We are working hard to catch up."
Of 9.5 billion total views to English Wikipedia in August 2014, about 3 billion—31.6%—are not reported in the raw per-article statistics. Other projects are assumed to have similar omissions based on their own mobile viewership ratios.
West told the Signpost he ran into the problem when collating view statistics for the English Wikipedia's Medicine WikiProject. The figures are being used in an upcoming academic paper comparing Wikipedia to WebMD, the World Health Organization, the National Institutes of Health, and other high-traffic medical websites. West caught the error early enough to add a disclaimer, but he's "curious and fearful as to how many other WikiProjects and researchers might have fallen into the same trap."
Unfortunately, that number is not zero. For a new example, Variety's new "Digital Audience Ratings" use Wikipedia's traffic statistics as a key cog. Jason Klein of ListenFirst, the company writing the posts, said in an interview with Lost Remote that "We have been monitoring Wikipedia page views daily for tv shows (as well as films and consumer brands) for over two years, and have found fascinating trends ..." (Editor's note: for additional information, please see this week's "In the media").
Similarly affected are the English Wikipedia's top 25 viewed articles (ten of which are used in the Signpost's weekly "Traffic report"). All of these initiatives are missing out on what West calls the mobile "bump" that popular culture and breaking-news events kindle.
The largest ramification may be reserved for users in the global south, where a higher percentage of individuals use mobile phones to surf the web. High-priced traditional computers can be out of reach for large segments of the population, who have turned instead to smartphones; this was a chief inspiration for the Wikimedia Foundation's Wikipedia Zero project. Pgallert laid out the scope of the computer issue on the Wikimedia blog last year:
“ | The computer lab of Epukiro Post 3 Junior Secondary School in the Omaheke Region of Namibia is idle most of the time. School management is afraid that equipment might be stolen or the infrastructure be damaged, and the Ministry of Education ... did not offer any training on how to operate the computers, or what to use them for. As a result, there are typing classes a few times a week, and nothing else. During school breaks the lab is not used at all. / The computers occupy one entire classroom; their power consumption is a liability for the school. ... Yet Epukiro Post 3 JSS houses the only computer lab in the entire rural settlement cluster of Epukiro, an area accommodating several thousand people and covering thousands of square kilometers. | ” |
Negrin told us that they are aware of the problem and are currently working to replace the current apparatus with a "modern, scalable system," which will come out in a preliminary form next quarter. The team is also working on a redefining what a "page view" is, taking modern concepts like mobile apps and web, API requests, and automated bots into account. Negrin added that "fortunately, we'll be in a position very soon to provide more accurate data to the Foundation and the Community."
The work involved in this is not negligible. As research analyst Oliver Keyes wrote to us, "The overall page view trends are of increasing importance to how we understand how people consume our site. At the moment we ... have a lot of ideas and a lot of the nuts and bolts worked out and tested, but it's fairly inchoate and needs to be organised better before we do anything with it. Once we have done that, we'll move on to implementing it and running it in parallel to the existing infrastructure to detect irregularities."
In the meantime, the unofficial status of grok.se (it is still listed as a "beta service") and the varying reliability of the WMF's data dumps leave researchers like West in the lurch. For example, grok.se periodically misses full days of stats (such as 28 August), which invariably leads to frustration with the website's coder, Henrik—but the issue lies with the WMF-released data. In the example, the traffic statistics for five hours (UTC 16:00–21:00) are missing.
It appears that statisticians, researchers, and curious Wikimedia contributors will have to wait only a little longer for a more stable and reliable solution.
Discuss this story
Possible typo in image caption: "new manage" --> "new manager" ? --Hispalois (talk) 05:45, 18 September 2014 (UTC)[reply]
I am not Pranav Curumsey