The Signpost

Recent research

Edit war patterns, deleters vs. the 1%, never used cleanup tags, authorship inequality, higher quality from central users, and mapping the wikimediasphere

Dynamics of edit wars

Controversy about Michael Jackson as quantified on the basis of reverted edits to his Wikipedia article. A: Jackson is acquitted on all counts after five month trial. B: Jackson makes his first public appearance since the trial to accept eight records from the Guinness World Records in London, including Most Successful Entertainer of All Time. C: Jackson issues Thriller 25. D: Jackson dies in LA.

"Dynamics of Conflicts in Wikipedia"[1] develops an interesting "measure of controversiality", something that might be of interest to editors at large if it were a more widely popularized and dynamically updated statistic. The paper analyzes patterns of edit warring over Wikipedia articles. The authors conclude that edit warriors are usually willing to reach consensus, and that the rare cases of never-ending warring are those that continually attract new editors who have not yet joined the consensus.

The authors' decision to exclude from the study articles with under 100 edits because they are "evidently conflict-free" is questionable. Articles with fewer than 100 edits have been subject to clear, if not overly long, edit warring. A recent example is Concerns and controversies related to UEFA Euro 2012. It is also unfortunate that "memory effects" – a term mentioned only in the abstract and lead, and which the authors suggest is significant in understanding the conflict dynamic – is not explained in the article. The term "memory", by itself, appears four times in the body, but is not operationalized anywhere.

A press release accompanied the paper, entitled "Wikipedia 'edit wars' show dynamics of conflict emergence and resolution". An MSNBC tech news headline misleadingly, but sensationally, summarized it as "Wikipedia is editorial warzone, says study".

Who deletes Wikipedia?

In a recent blog post by Wibidata, an analytics startup based in San Francisco, the authors set out to shed light on the often-quoted claim that most of Wikipedia was written by a small number of editors, noting other editorial patterns along the way.[2] Using the entire revision history of English Wikipedia (they wanted to show that their platform can scale), the authors looked at the distribution of edits across editor cohorts, grouped by number of total edits. They found that from a pure count perspective, the most active 1% of editors had contributed over 50% of the total edits. (see original plot here)

In response to the suggestion that the strongly skewed distribution of edits might just be due to a core set of editors who primarily make only minor formatting modifications, they looked at the net number of characters contributed by each editor. Grouping editors by total number of edits as before, they showed an even more strongly skewed distribution, with the top 1% contributing well over 100% of the total number characters on Wikipedia (i.e. an amount of text that is larger than the current Wikipedia) and the bottom 95% of editors deleting more on average than they contributed (original plot). Next, the authors separated logged in users from non-logged in "users" (identified only by IP addresses) and recomputed the distribution of net character contributions. By edit-count cohort, logged-in users tended to contribute significantly more than their anonymous counterparts, and non-logged-in users tended to delete significantly more (original plot).

In summary, low-activity and new editors, along with anonymous users, tend to delete more than they contribute; this reinforces the notion that Wikipedia is largely the product of a small number of core editors.

Published in proceedings of *SEM, a computational semantics conference, researchers from the University of North Texas and Ohio University looked into the nature of interlingual links on Wikipedia, both reviewing the quality of existing links and exploring possibilities for automatic link discovery.[3] The researchers took the directed graph of interlingual links on Wikipedia and used the lens of set-theoretic operations to structure an evaluation of existing links, to build a system for automatic link creation. For example, they suggest that the properties of symmetry and transitivity should hold for the relation of interlingual linking. This means that if there is an interlingual link from language A to B, there should also be a link from B to A, and if there is a link from language A to B, and language B to C, then there should be a link from language A to C. (This assumption is routinely made by the many existing Interwiki bots.) They further refine the notion of transitivity, by grouping article pairs by the number of transitive 'hops' required to connect a candidate article pair.

Their methodology revolves around the creation of a sizeable annotated gold data set. Using these labels, they first evaluated the quality of existing links, finding between one half and one third to fail their criteria for legitimate translations. They then evaluated the quality of various implied links. For example, reverse links where they do not already exist satisfy their criteria for faithful translation only 68% of the time.

The gold data set was used to train a boosted decision-tree classifier for selecting good candidate pairs of articles. They used various network topology features to encode the information in interlingual links for a given topic and found that they can significantly beat the baseline, which uses only the presence of direct links (73.97% compared with 69.35% accuracy).

"Wikipedia Academy" preview

Various conference papers and posters from the upcoming "Wikipedia Academy" (hosted by the German Wikimedia chapter from June 29 to July 1 in Berlin) are already available online. A brief overview of those which are presenting new research about Wikipedia:

Posters

Researcher Felipe Ortega blogged[16] about a new parser for Wikipedia dumps, to be integrated into "WikiDAT (Wikipedia Data Analysis Toolkit) ... a new integrated framework to facilitate the analysis of Wikipedia data using Python, MySQL and R. Following the pragmatic paradigm 'avoid reinventing the wheel', WikiDAT integrates some of the most efficient approaches for Wikipedia data analysis found in libre software code up to now", which will be featured in a workshop at the conference.

Special issue of "Digithum" on Wikipedia research

The open-access journal "Digithum" (subtitled "The Humanities in the Digital Era") has published a special issue containing five papers about Wikipedia from various disciplines, with a multilingual emphasis (including research about non-English Wikipedias, and Catalan and Spanish versions of the papers alongside the English versions):

Briefly

The bonobo (here a juvenile) is amongst the species that the Flora and Fauna finder finds for Congo.

References

  1. ^ Yasseri, Taha; Sumi, Robert; Rung, András; Kornai, András; Kertész, János (2012). Szolnoki, Attila (ed.). "Dynamics of Conflicts in Wikipedia". PLOS ONE. 7 (6): e38869. arXiv:1202.3643. Bibcode:2012PLoSO...738869Y. doi:10.1371/journal.pone.0038869. PMC 3380063. PMID 22745683. Open access icon
  2. ^ Who Deletes Wikipedia?, June 6, 2012.
  3. ^ Dandala, B., Mihalcea, R., & Bunescu, R. (n.d.). Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia. Retrieved from http://ixa2.si.ehu.es/starsem/proc/pdf/STARSEM-SEMEVAL004.pdf PDF
  4. ^ Maik Anderka, Benno Stein and Matthias Busse: On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia (PDF) Open access icon
  5. ^ Iolanda Pensa: The Power of Wikipedia: Legitimacy and Territorial Control (PDF) Open access icon
  6. ^ Simeona Petkova: Individual and Cultural Memories on Wikipedia and Wikia, Comparative Analysis (PDF) Open access icon
  7. ^ Alexander Mehler, Christian Stegbauer and Rüdiger Gleim: Latent Barriers in Wiki-based Collaborative Writing (PDF) Open access icon
  8. ^ Bernardo Esteves and Henrique Cukierman: The climate change controversy through 15 articles of Portuguese Wikipedia (PDF) Open access icon
  9. ^ Guillermo Garrido, Enrique Alfonseca, Jean-Yves Delort and Anselmo Peñas: "Extracting Wikipedia Historical Attributes Data" (PDF) Open access icon
  10. ^ Fabian Flöck and Andriy Rodchenko: Whose article is it anyway? – Detecting authorship distribution in Wikipedia articles over time with WIKIGINI (PDF) Open access icon
  11. ^ Moritz Braun: Here be Trolls: Motives, mechanisms and mythology of othering in the German Wikipedia community (PDF) Open access icon
  12. ^ Carlos D'Andréa: Seft-organization and emergence in peer production: editing “Biographies of living persons” in Portuguese Wikipedia (PDF) Open access icon
  13. ^ Djordje Stakic: Biographical articles on Serbian Wikipedia and application of the extraction information on them (PDF) Open access icon
  14. ^ Stephan Ligl: Wikipedia article namespace – user interface now and a rhizomatic alternative (PDF)
  15. ^ Marc Miquel-Ribé, David Morera-Ruíz and Joan Gomà-Ayats: Extensive Survey to Readers and Writers of Catalan Wikipedia: Use, Promotion, Perception and Motivation (PDF) Open access icon
  16. ^ Ortega, Felipe: "Improving the extraction of Wikipedia data" libresoft.es, 2012-06-03
  17. ^ Marcia W. DiStaso, Marcus Messner: "Wikipedia’s Role in Reputation Management: An Analysis of the Best and Worst Companies in the United States" DIGITHUM, NO 14 (2012) Open access icon
  18. ^ Antoni Oliver, Salvador Climent: Using Wikipedia to develop language resources: WordNet 3.0 in Catalan and Spanish Open access icon
  19. ^ David Gómez Fontanills: "Panorama of the wikimediasphere" Open access icon
  20. ^ Nathaniel Tkacz: "The Truth of Wikipedia" Open access icon
  21. ^ Emilio José Rodríguez Posada, Ángel González Berdasco, Jorge A. Sierra Canduela, Santiago Navarro Sanz, Tomás Saorín: Wiki Loves Monuments 2011: the experience in Spain and reflections regarding the diffusion of cultural heritage. Digithum, no. 14 (May, 2012), p. 94. Open access icon
  22. ^ Morton-Owens, E. G. (2012). A tool for extracting and indexing spatio-temporal information from biographical articles in Wikipedia. New York University. PDF
  23. ^ "How Big Data Sees Wikipedia". 14 June 2012.
  24. ^ Vrandečić, D. (2012). Ratio of language links to full text in Wikipedias" simia.net, June 2012
  25. ^ Qin, Xiangju; Cunningham, Pádraig (2012). "Assessing the Quality of Wikipedia Pages Using Edit Longevity and Contributor Centrality". arXiv:1206.2517 [cs.SI].
  26. ^ Hensel, T. (2012, March 11). Impact of duration of the search on the trust judgment of Wikipedia articles. Retrieved from http://essay.utwente.nl/61602/1/Hensel%2C_T.N.C.H._%2D_s0170860_(verslag).pdf

















Wikipedia:Wikipedia Signpost/2012-06-25/Recent_research