The Signpost

Recent research

Quantifying quality collaboration patterns, systemic bias, POV pushing, the impact of news events, and editors' reputation

Collaboration pattern analysis: Editor experience more important than "many eyes"

One of the motifs indicating article quality: One editor (top) having worked on several related articles (bottom)

A paper titled "Characterizing Wikipedia Pages Using Edit Network Motif Profiles"[1] by three researchers from University College Dublin indicates that the quality of a Wikipedia article can be predicted from characteristics of its "edit network" – a graph derived from the collaboration of Wikipedians in that area. Network motifs are small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. (The concept is popular in bioinformatics, where it is applied to gene regulatory networks.) In this paper, the authors use graphs with at most five nodes consisting of users and articles, which are connected by an edge if the user has edited the article – giving 17 possible "Wikipedia network motifs". (Anonymous users are disregarded.) For a Wikipedia article, the researchers form an "ego network" consisting of that article, articles which link to it (and have been edited by at least one of the users who edited the core article), and the users who edited them. For a sample of around 2000 articles from the History and United States categories, the frequencies of the 17 "Wikipedia network motifs" in those article's "ego networks" were calculated.

Using machine learning techniques, the researchers discerned with some certainty articles of basic quality (defined as having been assessed as Start class by Wikipedians) from those of good quality (defined as Featured or B class), solely based on this set of motif frequencies in the article's edit network. Looking at the impact of each of the 17 types separately, they found that "all network motifs have some potential to discriminate between good and basic Wikipedia articles" in the sample, but that among the four best predicting motifs, three are "stars with editors at their centre":

"This is interesting because it shows that many eyes is not really the defining characteristic of quality; instead experience is important – the editors should have worked on many other articles."

Another section of the paper constructs spatializations of the sample (i.e. a 2D mapping where articles with similar motif frequency are close to each other). For the history articles sample, this visualization clearly separated B class and Start class articles, but Featured articles are "more spread out", with two clusters on opposite sides of the diagram. The researchers made the interesting discovery that this seems related to the assessed importance of the articles:

"It transpires that the Featured Articles on the left are inclined to be low or mid importance compared to high or top importance articles on the right. This niche characteristic is emphasized by the fact that these articles are inclined not to have been featured on the Wikipedia main page. We conclude from this that, at least in edit network terms, some low importance Featured Articles look like more ordinary articles. ... It seems that articles on niche topics can reach Featured Article status without a huge amount of collaboration."

Systemic bias quantified for 20 Wikipedias

A paper titled "Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages"[2] by two researchers from the Universitat Politècnica de Catalunya in Spain, published in the proceedings of the "Recent Advances in Natural Language Processing" conference and apparently based on the first author's masters thesis[3] attempts to test the hypothesis that contributing to the visibility of one's own country- or language-related content is among the motivations to participate in Wikipedia. According to the authors, informal surveys in the Catalan-based Wikipedia association Amical Viquipèdia showed how these topics "are a focus of interest for writing and conflict". They propose the concept of autoreferentiality "to describe the interest of a culture [in] itself, which in WP translates to the interest of editors [in] their own local content in a WP language edition", and set out to measure it by various quantitative features, which are defined on the article level and tested on a selection of articles that are assumed to be "local content", using the Java-based WikAPIdia tool. (This set is formed by starting with a few keywords clearly pertaining to the local language, and adding articles that share categories. As examples from their own language, the authors list "“catalunya”, “català”, and also “valencia” or “mallorquí”" as start words, which "retrieve titles in articles and categories like escriptors de catalunya or dret català, referring to writers and law".) Among the tested quantitative features were:

The paper applies the eventual formula to Wikipedias in 20 languages – the English-language edition is excluded due to its size and the difficulty of processing it "in all dimensions", as well as the second- and third-largest Wikipedias (German and French). In the final "autoreferentiality index", the Icelandic, Japanese and Swahili Wikipedias come out as the most locally focused among these 20, while, curiously, the Catalan edition which prompted the research question has the lowest autoreferentiality value.

Caused a "heavy editorial event" earlier this year: Elizabeth Taylor

Does "In the news"-like attention have a positive effect on article quality?

A five page paper[4] by a Ph.D. student in Computer Science at the University of Iowa examines "The Impact of Heavy Editorial Events on Wikipedia Page Quality" – for example the flurry of edits to the article Elizabeth Taylor after the actor's death in March 2011. To measure quality, the approach of an earlier paper[5] is used, which assigns article contributors a reputation value depending on how many of their earlier contributions have been deleted, and by whom, and also takes into account whether the article revision in question was reverted later. The resulting formula was applied to "high editorial events" in 100 articles of the English Wikipedia, from the start of Wikipedia in 2001 until the beginning of 2010. As expected, the data supported the hypothesis that "high editorial events would contribute positively to a page's quality". The five articles impacted most positively among the studied sample (biased toward the beginning of the alphabet) were art, Allen Ginsberg, anarcho-capitalism, chiropractic and death. The paper also found that a higher increase in the edit rate was associated with a higher quality increase, but does not address the question of whether the relation could be explained by the mere number of edits (i.e. whether the same number of edits over a longer time might have had the same effect).

Detecting POV pushing editors

A working paper posted this month to ArXiv with the title "Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia" presents a method to score the neutrality of Wikipedia contributors and to "detect potential POV pushing behavior".[6] The authors propose two metrics to quantify an editor's involvement in controversial topics. The first metric (Controversy score or C-score) measures the amount of attention spent by an individual editor on controversial articles, where controversiality is defined on the basis of several quantitative factors previously established in the literature. The second metric (Clustered Controversy score or CC-score) quantifies the focus of an editor's attention on controversial articles on the same topic or very similar topics: the purpose of this metric is to tease apart editors involved in genuine controversy resolution (such as administrators who are likely to participate in a broad range of discussions on controversial topics) from "potentially manipulative users" who focus their attention on a narrow set of controversial topics. To assess the validity of the above metrics the authors test their discriminatory power at identifying which editors are blocked and which are regular users who were never blocked. The remainder of the paper examines the breakdown of edits by administrators immediately after a successful Request for Adminship. The results, based on qualitative coding by a single reviewer, suggests that some topical areas in the English Wikipedia (such as politics and media) are more likely to be frequently edited by administrators with a high C-score and CC-score than any other topical categories.

Historian of encyclopedias reviews Good Faith Collaboration

The most recent issue of Annals of Science (a scholarly journal about the history of science and technology, founded in 1936) contains a four-page review[7] of Joseph Reagle's book Good Faith Collaboration: The Culture of Wikipedia (published in 2010 and recently released on the Web under a CC-BY-NC-SA license). The reviewer Jeff Loveland, who has written extensively about the early history of encyclopedias, criticizes the book for having "one major weakness, namely in historical contextualization" (he mentions two 18th-century precedents which should have been given more attention, as they, like Wikipedia, intended to include contributions from the public: Vincenzo Coronelli's Biblioteca Universale and Zedler's Universal-Lexicon) – and rejects Reagle's claim that "historically, reference works have made few claims about neutrality as a stance of collaboration, or as an end result": "References to such values as impartiality, unbiasedness and objectivity are frequent in the prefaces of encyclopaedias over the last three hundred years". On the other hand, the reviewer praises the book for "com[ing] close to offering" a comprehensive introduction to Wikipedia, "touching as it does on nearly all aspects of the encyclopaedia" and he commends the author's writing style as "informal, energetic and appropriately paced". The "insightful and worthwhile" ethnography of Wikipedia is highlighted as the second success of the book.

Regarding Chapter 3 of the book, which postulates Neutral Point of View and Assume Good Faith as the two principles at "the heart of Wikipedia collaboration", the review recommends "Anne Goldgar’s study of conduct as a force binding together the early modern Republic of Letters in Impolite Learning (1995) [as] an interesting point of comparison" regarding "the historical connection between knowledge and civility". Commenting on Chapter 7, which examines criticism of Wikipedia, Loveland observes that "the portrayal by critics of a possible Wikipedian collective intelligence as anti-individualistic, or anti-rationalistic seems opportunistic and off-the-mark. Meanwhile, Wikipedia now bears the brunt of a refurbished but centuries-old accusation against encyclopaedias, namely that they trivialize and fragment knowledge."

Briefly

References

  1. ^ Wu, Guangyu, Martin Harrigan, and Pádraig Cunningham (2011). Characterizing Wikipedia pages using edit network motif profiles. In Proceedings of the 3rd international workshop on Search and mining user-generated contents - SMUC 2011, New York, NY, USA: ACM Press, October 28, 2011. DOIPDF Open access icon
  2. ^ Ribé, Marc Miquel, and Horacio Rodrìguez (2011) Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages. In Proceedings of Recent Advances in Natural Language Processing, 316–22. Hissar, Bulgaria. PDF Open access icon
  3. ^ Ribé, Marc Miquel (2011) Cultural configuration of Wikipedia: Measuring autoreferentiality in different languages. Universitat Politècnica de Catalunya. PDF Open access icon
  4. ^ Oliver, Corey (2011) The Impact of Heavy Editorial Events on Wikipedia Page Quality. PDF Open access icon
  5. ^ Javanmardi and C. Lopes. Statistical Measure of Quality in Wikipedia. In: 1st Workshop on Social Media Analytics (SOMA ’10), July 2010. PDF Open access icon
  6. ^ Das, Sanmay, Allen Lavoie, and Malik Magdon-Ismail (2011). Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia. arXiV, November 8, 2011. PDFOpen access icon
  7. ^ Loveland, Jeff (2011). Review of: Good Faith Collaboration: The Culture of Wikipedia. Annals of Science 68 (4) (October) 555-558. DOIClosed access icon
  8. ^ Wöhner, Thomas, Sebastian Köhler, and Ralf Peters (2011). Automatic Reputation Assessment in Wikipedia. In ICIS 2011 Proceedings. HTML Closed access icon
  9. ^ Chen, Weiqin, and Rolf Reber (2011). Writing Wikipedia Articles as Course Assignment. In Proceedings of the 19th International Conference on Computers in Education, T. Hirashima et al. (Eds). Chiang Mai, Thailand. PDF Open access icon
  10. ^ Crowston, Kevin, Nicolas Jullien, and Felipe Ortega (2011) Too Few New Wikipedians? Modelling Effort and Participation in Wikipedia. SSRN eLibrary. PDF Open access icon
  11. ^ Sormunen, Eero, and Leeni Lehtio (2011) Authoring Wikipedia articles as an information literacy assignment – copy-pasting or expressing new understanding in one's own words? PDFOpen access icon
  12. ^ Deng, Yihan (2011) Change Tracking in Wikipedia. Master Thesis, PDF Open access icon
  13. ^ 'LauraHale, Hawkeye7, Pine and others, "Mind the Gap(s)! Writing Styles of Female Editors on Wikipedia"

















Wikipedia:Wikipedia Signpost/2011-11-28/Recent_research