A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Improving Wikipedia articles may contribute to increasing local tourism. That's the result of a study[1] published as preprint a few weeks ago by M. Hinnosaar, T. Hinnosaar, M. Kummer and O. Slivko. This group of scholars from various universities – including Collegio Carlo Alberto, the Center for European Economic Research (ZEW) and Georgia Institute of Technology – led a field experiment in 2014: they expanded 120 Wikipedia articles regarding 60 Spanish cities and checked the impact on local tourism, by measuring the increased number of hotel stays in the same cities from each country. The result was an average +9 % (up to 28 % in best cases). Random city articles were expanded mainly by translating contents taken from the Spanish or the English edition of Wikipedia into other languages, and by adding some photos. The authors wrote: "We found a significant causal impact of user-generated content in Wikipedia on real-life choices. The impact is large. A well-targeted two-paragraph improvement may lead to a 9 % increase in the visits by tourists. This has significant implications both in macroeconomic and microeconomic scale."
The study revises an earlier version[supp 1] which declared the data was inconclusive (not statistically relevant yet) although there were hints of a positive effect. It's not entirely clear to this reviewer how the statistical significance was ascertained, but the method used to collect data was sound:
Curiously, while the authors had no problems adding their translations and images to French, German and Italian Wikipedia, all their edits were reverted on the Dutch Wikipedia. Local editors may want to investigate what made the edits unacceptable: perhaps the translator was not as good as those in the other languages, or the local community is prejudicially hostile to new users editing a mid-sized group of pages at once, or some rogue user reverted edits which the larger community would accept? [PS: One of our readers from the Dutch Wikipedia has provided some explanations.]
Assuming that expanding 120 stubs by translating existing articles in other languages takes few hundreds hours of work and actually produces about 160,000 € in additional revenue per year as estimated by the authors, it seems that it would be a bargain for the tourism minister of every country to expand Wikipedia stubs in as many tourist languages as possible, also making sure they have at least one image, by hiring experienced translators with basic wiki editing skills. Given that providing basic information is sufficient and neutral text is generally available in the source/local language's Wikipedia, complying with neutral point of view and other content standards seems to be sufficiently easy.
A paper at the upcoming OpenSym conference titled "An end-to-end learning solution for assessing the quality of Wikipedia articles"[2] combines the popular deep learning approaches of recurrent neural networks (RNN) and long short-term memory (LSTM) to make substantial improvements in our ability to automatically predict the quality of Wikipedia's articles.
The two researchers from Université de Lorraine in France first published on using deep learning for this task a year ago (see our coverage in the June 2016 newsletter), where their performance was comparable to the state-of-the-art at the time, the WMF's own Objective Revision Evaluation Service (ORES) (disclaimer: the reviewer is the primary author of the research upon which ORES' article quality classifier is built). Their latest paper substantially improves the classifier's performance to the point where it clearly outperforms ORES. Additionally, using RNNs and LSTM means the classifier can be trained on any language Wikipedia, which the paper demonstrates by outperforming ORES in all three of the languages where it's available: English, French, and Russian.
The paper also contains a solid discussion of some of the current limitations of the RNN+LSTM approach. For example, the time it takes to make a prediction is too slow to deploy in a setting such as ORES where quick predictions are required. Also, the custom feature sets that ORES has allow for explanations on how to improve article quality (e.g. "this article can be improved by adding more sources"). Both are areas where we expect to see improvements in the near future, making this deep learning approach even more applicable to Wikipedia.
A recently published journal paper by Michail Tsikerdekis titled "Cumulative Experience and Recent Behavior and their Relation to Content Quality on Wikipedia"[3] studies how factors like an editor's recent behavior, their editing experience, experience diversity, and implicit coordination relate to improvements in article quality in the English Wikipedia.
The paper builds upon previous work by Kittur and Kraut that studied implicit coordination,[supp 2] where they found that having a small group of contributors doing the majority of the work was most effective. It also builds upon work by Arazy and Nov on experience diversity,[supp 3] which found that the diversity of experience in the group was more important.
Arguing that it is not clear which of these factors is the dominant one, Tsikerdekis further extends these models in two key areas. First, experience diversity is refined by measuring accumulated editor experience in three key areas: high quality articles, the User and User talk namespaces, and the Wikipedia namespace. Secondly, editor behavior is refined by measuring recent participation in the same three key areas. Lastly he adds interaction effects, for example between these two new refinements and implicit coordination.
Using the more refined model of experience diversity results in a significant improvement over baseline models, and an interaction effect shows that high coordination inequality (few editors doing most of the work) is only effective when contributors have low experience editing the User and User talk namespaces. However, the models that incorporate recent behavior are substantial improvements, indicating that recent behavior has a much stronger impact on quality than overall editor experience and experience diversity. Again studying the interaction effects, the findings are that implicit coordination is most effective when contributors have not recently participated in high quality articles, and that contributors make a stronger impact on content quality when they edit articles that match their experience levels.
These findings ask important questions about how groups of contributors in Wikipedia can most effectively work together to improve article quality. Future work is needed to understand more about when explicit coordination is most useful, and the paper points to the possibility of using recommender systems to route contributors to groups where their experience level can make a difference.
"Automatic Classification of Wikipedia Articles by Using Convolutional Neural Network"[4] is the title of a paper published at this year's Qualitative and Quantitative Methods in Libraries conference. As the title describes, the paper applies convolutional neural networks (CNN) to the task of predicting the Nippon Decimal Classification (NDC) category that a Japanese Wikipedia article belongs to. This NDC category can then be used for example to suggest further reading, providing a bridge between the online content of Wikipedia and the books that are available in Japan's libraries.
In the paper, a Wikipedia article is represented as a combination of Word2vec vectors: one vector for the article's title, one each for the categories it belongs to, and one for the entire article text. These vectors combine to form a two-dimensional matrix, which the CNN is trained on. Combining the title and category vectors results in the highest performance, with 87.7% accuracy in predicting the top-level category and 74.7% accuracy for the second-level category. The results are promising enough that future work is suggested where these will be used for book recommendations.
The work was motivated by "recent research findings [indicating] that relatively few students actually search and read books," and "aims to encourage students to read library books as a more reliable source of information rather than relying on Wikipedia article."
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
{{cite journal}}
: Cite journal requires |journal=
(help)
{{cite journal}}
: Cite journal requires |journal=
(help) author's preprint
Discuss this story