The Signpost

Recent research

"Rise and decline" of Wikipedia participation, new literature overviews, a look back at WikiSym 2012

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

"The rise and decline" of the English Wikipedia

A paper to appear in a special issue of American Behavioral Scientist (summarized in the research index) sheds new light on the English Wikipedia's declining editor growth and retention trends. The paper describes how "several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers".[1] The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009[2] has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly "restrictive environment" they face.

Quality of newcomers over time. The proportion of newcomers falling into the two desirable quality classes is plotted over time.
Rejection of desirable newcomers. The proportion of desirable newcomers in their first edit session is plotted over time.
Survival of desirable newcomers. The proportion of desirable newcomers continuing to edit for at least two months is plotted over time.

First, Wikipedia is increasingly likely to reject desirable newcomers' contributions, be it in the form of reverts or deletions. Second, it is increasingly likely to greet them with impersonal messages; the authors cite a study that shows that by mid 2008 over half of new users received their first message in a depersonalized format, usually as a warning from a bot, or an editor using a semi-automated tool[3]. They show a correlation between the growing use of various depersonalized tools for dealing with newcomers, and the dropping retention of newcomers. The authors speculate that unwanted but good faithed contributions were likely handled differently in the early years of the project – unwanted changes were fixed and non-notable articles were merged. Startlingly, the authors find that a significant number of first time editors will make an inquiry about their reverted edit on the talk page of the article they were reverted on only to be ignored by the Wikipedians who reverted them. Specifically editors who use vandal-fighting tools like Huggle or Twinkle are increasingly less likely to follow the Wikipedia:Bold, revert, discuss cycle and respond to discussions about their reverts.

As a third factor, the authors note that the majority of Wikipedia rules were created before 2007 and have not changed much since, and thus new editors face the environment where they have little influence on the rules that govern their behavior, and more importantly, how others should behave toward them. The authors note that this violates Ostrom's 3rd principle for stable local common pool resource management, by effectively excluding a group that is very vulnerable to certain rules from being able to effectively influence them.

The authors recognize that automated tools and extensive rules are needed to deal with vandalism and manage a complex project, but they caution that the currently evolved customs and procedures are not sustainable for the long term. They suggest Wikipedia editors could copy the strategy of distributed, automated tools that have proven so effective at dealing with vandalism (e.g. Huggle & User:ClueBot NG) to build tools that aid in identifying and supporting desirable newcomers (a task in which Wikipedia increasingly fails[4]). Further, they recommend that the newcomers are given a voice, if indirectly via mentors, when it comes to how rules are created and applied.

Overall, the authors present a series of very compelling arguments, and the only complaint this reviewer has is that (even though three of the four were among the Wikimedia Foundation's visiting researchers for the Summer of Research 2011) they do not discuss the fact that the Foundation and the wider community has recognized similar issues, and has engaged in debates, studies, pilot programs and such aimed to remedy the issue (see for example the WMF Editor Trends Study).

Literature reviews of Wikipedia's inputs, processes, and outputs

Nicolas Jullien's "What we know about Wikipedia. A review of the literature analyzing the project(s)"[5] is an attempt at a "comprehensive" literature review of academic research on Wikipedia. Jullien works to distinguish his literature review from previous attempts like those of Okoli and collaborators (cf. earlier coverage: "A systematic review of the Wikipedia literature") and of Park which tend to split the literature into three main themes: (1) motivations of editors to contribute and relationship between motivation and contribution quality, (2) editorial processes and organization and its relationship to quality and (3) the quality and reliability of production.

Jullien builds on this basic framework by Carillo and Okoli, but distinguishes his from their work in several ways. First, Jullien holds that previous work has focused too little on the outputs, which his analysis emphasizes more. Second and crucially, Jullien's review is not limited to material published in journals and, as a result, is more representative of fields like computer science, HCI, and CSCW, which publish many of their most influential articles in conference proceedings. Jullien does not consider articles on how Wikipedia is used, questions of tools and their improvement, and studies that only use Wikipedia as a database (e.g., to test an algorithm). Other than this, the study is not limited to any particular field. It covers articles published in English, French and Spanish before December 2011, mostly based on searches in WebofScience and Scopus (sharing the search query used in the latter). The review is structured around inputs, processes, and outputs.

In terms of inputs, Jullien considers broad cultural factors in the broader environment and questions of why people choose to participate or join Wikipedia. In terms of process, he considers questions about the activities and roles of contributors, the social (e.g., network) structure of both the projects and the individuals who participants, the role of teams and organization of people within them, the processes around editing, creation, deletion, and promotion of articles with a particular focus on conflict, and questions of management and leadership. In terms of outputs, the paper divides publications into studies of process, Wikipedia user experience, the external evaluation of Wikipedia articles, and questions of Wikipedia coverage.

A second recent preprint by Taha Yasseri and János Kertész [6] likewise gives an overview of vast areas of recent research about Wikipedia. Subtitled "Sociophysical studies of Wikipedia" and citing 114 references, it compares some of the authors' own results on e.g. editing patterns (covered in several past issues of this research report, e.g.: "Dynamics of edit wars") with existing literature. The review focuses on quantitative data-driven analyses of Wikipedia production, reproduces and reports a series of previous analyses, and extends some of the earlier findings.

After a detailed description of how Wikipedia works, the authors walk through a series of types of quantitative analyses of patterns of editing to Wikipedia. They use "blocking" of edits to characterize good and "bad" editors and describe different editing patterns between these groups. The authors show that editors, in general, tend to edit in a "bursty" pattern with long periods of breaks and that editing tends to follow daily and weekly patterns that vary by culture. They also walk through several approaches for classifying edits by type, and discuss the characterization of linguistic features with an emphasis on readability.

Much of their article is focused on the issue of conflicts and edit warring. The authors pay particular attention both to the identification of conflicts and of controversial articles and topics and to characterizing the nature of edit warring itself. The paper ends with the description of an agent-based model of edit warring and conflict.

WikiSym 2012: overview report

The International Symposium on Wikis and Open Collaboration -– "WikiSym 2012" – was held August 27–29 in Linz, Austria. The three-day conference featured research papers, posters and demonstrations, and open space discussion sessions. About 80 researchers and wiki experts from around the world attended.

WikiSym is an academic conference, now in its eighth year, that seeks to highlight research on wikis and open collaboration systems. This year’s WikiSym had a strong focus on Wikipedia research, with studies that ranged from analyzing breaking news articles on Wikipedia to looking at the behavior of Wikipedia editors and how long they stay active. In all, 17 papers focused on Wikipedia or MediaWiki, and the two keynotes also focused on Wikipedia research.

The first keynote session was given by Jimmy Wales, who discussed challenges for Wikipedia and potential research questions that matter to the Wikimedia community [2][3]; Wales focused particularly on questions around diversity of the editing body, how to grow small language communities, and how to retain editors. The closing keynote was given by Brent Hecht, a researcher from Northwestern University, who spoke on techniques for making multilingual comparisons of content across Wikipedia versions, which in turn allows researchers to identify the potential cultural biases of various Wikipedia editions. Hecht found, for instance, that (looking at interwiki links across 25 languages) the majority of Wikipedia article topics only appear in 1 language; that the overlap between major language editions is relatively small; and that the depth of geographical representation varies widely by language, which a bias towards representing the country or place where that edition's language is prominent. Hecht also compared articles on the same topic across Wikipedias to see the degree of similarity between them. Hecht described his work as "hyperlingual", developing techniques to gain a broader perspective on Wikipedia by looking across language editions. His content comparison tool can be seen at the Omnipedia site, and the WikAPIdia API software he developed can be downloaded here. (See also earlier coverage about Omnipedia: "Navigating conceptual maps of Wikipedia language editions")

In addition to the presented papers, some of which are profiled below, WikiSym has a strong tradition of hosting open space sessions in parallel with the main presentations, so that attendees can discuss topics of interest. This year’s open space topics included helping new wiki users; non-text content in wikis (including videos, images, annotations, slideshows and slidecasting); the future of WikiSym; Wikipedia bots; surveying Wikipedia editors; and realtime wiki synchronization and multilingual synchronization feedback. The conference closed with a panel session entitled "What Aren't We Measuring?", where panelists discussed and debated various methods for quantifying wiki-work (by studying editors, edits, and other metrics).

This year's WikiSym was hosted at the Ars Electronica Center in Linz, a "museum of the future" that hosts the Ars Electronica festival every year. The colorful, dramatic Ars Electronica building is in the heart of Linz, so outside of sessions conference attendees enjoyed exploring and socializing in the city center. The conference dinner was held at the Pöstlingberg Schlössl, which is accessed by one of the steepest mountain trams in the world.

WikiSym 2012 papers and poster and demonstration abstracts may be downloaded from the conference website. Next year’s WikiSym is planned for Hong Kong, just before Wikimania 2013. Updates on the schedule and important dates can be found on the WikiSym blog.

On the "Ethnography Matters" blog, participant Heather Ford looked back at the conference,[7] stating that "WikiSym is dominated by big data quantitative analyses of English Wikipedia", asking "where does ethnography belong?" and counting 82% of the Wikipedia-related papers as examining the English Wikipedia and only 18% about other language Wikipedias. A panel at WikiSym 2011 had called to broaden research to other languages (see last year's coverage: "Wiki research beyond the English Wikipedia at WikiSym").

WikiSym 2012 papers

The conference papers and posters included, (apart from several ones that have been covered in earlier issues of this report):

"First Monday" on rhetoric, readability and teaching

First Monday, the veteran open access journal about Internet topics, featured three Wikipedia-themed papers in its September issue:

Briefly

References

  1. ^ Halfaker, A., Geiger, R.S., Morgan, J. and Riedl, J. (2012), The Rise and Decline of an Open Collaboration Community, American Behavioral Scientist, forthcoming. HTML summary Open access icon
  2. ^ http://strategy.wikimedia.org/wiki/Editor_Trends_Study
  3. ^ Geiger, R. S., Halfaker, A., Pinchuk, M., & Walling, S. (2012). Defense Mechanism or Socialization Tactic? Improving Wikipedia's Notifications to Rejected Contributors. ICWSM.
  4. ^ Musicant, D. R., Ren, Y., Johnson, J. A., & Riedl, J. (2011). Mentoring in Wikipedia: a clash of cultures. WikiSym 2011 (pp. 173–182). [1]
  5. ^ Jullien, N. (2012). What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s). SSRN Electronic Journal. PDF Open access icon
  6. ^ Yasseri, T., & Kertész, J. (2012). Value production in a collaborative environment. Physics and Society; Computers and Society; Data Analysis, Statistics and Probability. PDF Open access icon
  7. ^ Ford, H. (2012) Where does ethnography belong? Thoughts on WikiSym 2012, Ethnography Matters HTML Open access icon
  8. ^ Chen, C.-C. and Roth, C. (2012), {{Citation needed}}: The dynamics of referencing in Wikipedia, WikiSym '12 PDF Open access icon
  9. ^ Faulkner, R., Walling, S. and Pinchuk, M. (2012), Etiquette in Wikipedia: Weening New Editors into Productive Ones, WikiSym '12 PDF Open access icon
  10. ^ West, A.G. and Lee, I. (2012) Towards Content-driven Reputation for Collaborative Code Repositories, WikiSym '12 PDF Open access icon
  11. ^ Schneider, J., Passant, A. and Decker, S. (2012) Deletion Discussions in Wikipedia: Decision Factors and Outcomes, WikiSym '12 PDF Open access icon
  12. ^ Jodi Schneider, Krystian Samp: Alternative Interfaces for Deletion Discussions in Wikipedia: Some Proposals Using Decision Factors. Demo, WikiSym'12, August 27–29, 2012, Linz, Austria. ACM 978-1-4503-1605-7/12/08. PDF Open access icon
  13. ^ Wu, G., Harrigan, M. and Cunningham, P. (2012) Classifying Wikipedia Articles Using Network Motif Counts and Ratios, WikiSym '12 PDF Open access icon
  14. ^ http://wikisym.org/ws2012/bin/download/Main/Program/p21wikisym2012.pdf [bare URL PDF]
  15. ^ http://wikisym.org/ws2012/bin/download/Main/Program/p15wikisym2012.pdf [bare URL PDF]
  16. ^ Famiglietti, Andrew. The pentad of cruft: A taxonomy of rhetoric used by Wikipedia editors based on the dramatism of Kenneth Burke. First Monday [Online], (19 August 2012) HTML Open access icon
  17. ^ Lucassen, Teun, Dijkstra, Roald, AND Schraagen, Jan Maarten. "Readability of Wikipedia" First Monday[Online], (20 August 2012) HTML Open access icon
  18. ^ Konieczny, Piotr. "Wikis and Wikipedia as a teaching tool: Five years later" First Monday [Online], (25 August 2012) HTML Open access icon
  19. ^ Robert P. Biuk-Aghai and Roy Chi Kit Chan (2012) Feeling the Pulse of a Wiki: Visualization of Recent Changes in Wikipedia, VINCI 2012, forthcoming PDF Open access icon
  20. ^ Wu, J., & Iwaihara, M. (2012). Wikipedia Revision Graph Extraction Based on N-Gram Cover. In Z. Bao, Y. Gao, Y. Gu, L. Guo, Y. Li, J. Lu, Z. Ren, et al. (Eds.), Lecture Notes in Computer Science, 2012, Volume 7419 (Vol. 7419, pp. 29–38). Berlin, Heidelberg: Springer Berlin Heidelberg. DOI Closed access icon
  21. ^ Hougland, J. (2012) Reverting in Wikipedia, Wibidata blog HTML Open access icon
  22. ^ Hahmann, S. and Burghardt, D. (2012), Investigation on factors that influence the (geo)spatial characteristics of Wikipedia articles PDF Open access icon
  23. ^ Mesgari, M. and Faraj, S. (2012) Technology Affordances: The Case of Wikipedia, AMCIS 2012 PDF Open access icon
  24. ^ Kint, M., & Hart, D. P. (2012). Should clinicians edit Wikipedia to engage a wider world web? BMJ (Clinical research ed.), 345, e4275. PDF Closed access icon
  25. ^ Ford, H. (2012) Wikipedia Sources: Managing Sources in Rapidly Evolving Global News Articles on the English Wikipedia, SSRN, August 2012. PDF Open access icon
  26. ^ Sultana, A., Hasan, Q.M.,, Biswas, A.K., Das, S., Rahman, H., Ding, C. and Li, C. (2012), Infobox Suggestion for Wikipedia Entities, 21st ACM International Conference on Information and Knowledge Management (CIKM '12) PDF Open access icon
  27. ^ Knäusl, H., Elsweiler, D. and Ludwig, B. (2012) Towards Detecting Wikipedia Task Contexts, 2nd European Workshop on Human-Computer Interaction and Information Retrieval, August 2012 PDF Open access icon
  28. ^ Walling, S. and Taraborelli, D. (2012), Is this thing on? Giving new Wikipedians feedback post-edit, Wikimedia Blog HTML Open access icon
  29. ^ Casebourne, I., Davies, C., Fernandes, M., Norman, N. (2012): Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic. PDF Open access icon
  30. ^ Sylvain Firer-Blaess and Christian Fuchs (2012), Wikipedia: An Info-Communist Manifesto, Television & New Media, 12 September 2012 abstract Closed access icon
  31. ^ Maik Anderka and Benno Stein: Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia. In: Pamela Forner, Jussi Karlgren, and Christa Womser-Hacker (Eds.): CLEF 2012 Evaluation Labs and Workshop – Working Notes Papers, 17–20 September, Rome, Italy. ISBN 978-88-904810-3-1. ISSN 2038-4963. 2012. PDF Open access icon

















Wikipedia:Wikipedia Signpost/2012-09-24/Recent_research