The Signpost

Recent research

Top female Wikipedians, reverted newbies, link spam, social influence on admin votes, Wikipedians' weekends, WikiSym previews

What the most active female editors contribute

A paper addressing gender imbalance in Wikipedia ("Gender differences in Wikipedia editing") by Judd Antin and collaborators won the "Best Short Paper" award at WikiSym.[1] This follows the awarding of "best full paper" to another study on the gender gap[2] already covered in previous editions of the research newsletter. The study by Antin and collaborators sampled 256,190 users who created a new account on the English Wikipedia between September 2010 and February 2011 and qualitatively coded their contribution by category of wiki work. The results suggest that, whereas in the lower three quartiles by activity level men and women make roughly the same contributions in each category of wiki work, in the top quartile editors behave in a significantly different way. The researchers found that among the top 25% of Wikipedians by activity level:

Effects of reverts on wiki work

Two plots show the changes in an editor's boldness after being reverted.

Another WikiSym 2011 paper by GroupLens researchers, including Summer of Research fellow Aaron Halfaker ("Don’t bite the newbies: how reverts affect the quantity and quality of Wikipedia work"), reports on the effects of reverts on the quality and quantity of Wikipedia editors, with a specific focus on newbies.[3] The study uses a number of key metrics to assess the quality of editor contributions (using reverts per revision and Persistent Word Revisions or PWR, to measure the survival across revisions of words added by an editor, other than stop-word) and changes in editor activity (using a controlled activity delta that calculates an editor's variation of activity across weeks with respect to the week preceding the revert, normalized by the editor's daily rate of activity). The results point at the same time at the important role of reverts as a learning and quality improvement process but also at their negative effects on new contributors. Below are highlights from this study:

These results are consistent with the findings by Summer of Research fellows on the effects of community interactions with new Wikipedians.

Further Wikipedia coverage at WikiSym 2011: Social dynamics and global reach

Geographic location of edits for English Wikipedia article 2011 Egyptian revolution

The "Wiki tools and interfaces" session at WikiSym will see the presentation of a paper titled "Autonomous link spam detection in purely collaborative environments". According to the five authors from the University of Pennsylvania, link spam is currently "an annoying, but non-pervasive issue", but could become a grave threat to Wikipedia if new spam techniques that were explored by some of them in another paper (see below) become more widespread.

Using the STiki software by one of the authors, which is already widely used as an anti-vandalism tool on the English Wikipedia, the researchers collected mainspace edits adding external links and extracted a corpus of 5,962 link additions classified as either ham or spam, using criteria such as whether the edit had been rolled back (to determine spam), or whether it had been added by a user with rollback rights (to determine ham). From this, the researchers derived numerous features that indicate link spamming behavior, in three areas: On-wiki evidence (including very simple metrics such as the URL's length – spam links tend to be shorter – or that older and more popular articles are more likely to be targeted), properties of the landing page that the link points to (these were found to be less useful), and classification from third-party sites, including Alexa and Google Safe Browsing. The backlinks data provided by Alexa proved to be most useful for the classifier that the authors went on to construct, and tested in a live implementation in the STiki tool. They conclude that "it is clear this work will benefit the Wikipedia community".

In another paper, presented earlier this month at CEAS ‘11, five authors from the same university including two of the same researchers examine the possibility of "Link spamming Wikipedia for profit". They picture spam detection on Wikipedia as a pipelined process, with the MediaWiki spam blacklist as the first stage (currently containing around 17000 regular expressions), recent changes patrollers (often aided by software tools) as the next – often reacting within seconds after an edit, watchlisters as the third (within minutes to days), and finally review by normal readers as the last stage. Based on a spam/ham corpus constructed as in the other paper, this paper contains some further analysis of the characteristics of link spam destinations and spamming accounts, and of the exposure spammed links receive before they are removed (determined by both the link's lifespan and the popularity of the spammed page). The most sensitive part of the paper then leverages these results to "describe a novel and efficient spam model we estimate can significantly outperform status quo techniques", e.g. by rapidly adding links to exploit the time lag of Wikipedia's spam removal process, or targetting popular pages. In a nod to WP:BEANS, the researchers admit that "there is the possibility that we have introduced previously unknown vectors", but the "Ethical Considerations" section emphasizes that:

"It is in no way this research’s intention to facilitate damage to Wikipedia or any wiki host. The vulnerabilities discussed in this section have been disclosed to Wikipedia’s parent organization, the Wikimedia Foundation (WMF). Further, the WMF was notified regarding the publication schedule of this document and offered technical assistance."

The authors also point to the implementation of the spam mitigation tool described in the WikiSym article.

However, the paper fails to mention that last year, one of its authors conducted actual, extensive tests of spamming techniques on the English Wikipedia that are very similar to those outlined in the paper. The spam attacks gained the attention of several IT security news websites, and even involved setting up a fake webshop to measure how many Wikipedia readers would have carried out an actual purchase of the penis enlargement pills advertised in the links. The case led to the researcher's temporary ban as a Wikipedia user, later lifted by the arbitration committee, and informed the research guidelines drafted later that year by the Wikimedia Foundation's Research Committee. See Signpost coverage: "Large scale vandalism revealed to be 'study' by university researcher" (includes a background interview with the researcher).

How social ties influence admin votes

A paper by three researchers from the University of the Philippines Diliman[6], presented at the International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011) two months ago, examined statistical relations between the voting behavior in requests for adminships (RfAs) and the on-wiki social contacts of participants. The paper includes a brief review of existing literature (in particular two papers which already studied the relation with existing social networks[7][8]). Drawing from a January 2008 dump of the English Wikipedia, they analyzed 2,587 elections conducted between 2004 and 2008 (48% of them successful, with 7,231 users voting or running in at least one RfA, and 80% of the final non-neutral votes being supportive), and "1,097,223 instances of communication between 265,155 distinct pairs of users" who had run or voted in an RfA – from user talk page messages, an undirected social graph was generated. Their results concern three areas:

Wikipedians' weekends in international comparison

A paper titled "Temporal characterization of the requests to Wikipedia" examined how search requests, read accesses and edits on Wikipedia change over time, and relate to those at the entirety of Wikimedia sites (based on squid logs for the whole year of 2009, provided by the Wikimedia Foundation). Among findings are differences between language versions of Wikipedia, such as that the "the number of edits tends to raise in weekends" for the French, Japanese, Dutch and Polish Wikipedia, but not for other languages. Another paper, titled "Circadian patterns of Wikipedia editorial activity: A demographic analysis"[9], similarly analyzed "34 Wikipedias in different languages [trying] to characterize and find the universalities and differences in temporal activity patterns of editors", with the underlying data provided by the German Wikimedia chapter from the toolserver. They found that "in contrast to diurnal [daily] pattern, which is universal to a great extent, weekly activity patterns of WPs show remarkable differences. We could, however, identify two main categories, namely 'weekends' and 'working days' active WPs."[10]


In brief

References

  1. ^ Antin, Judd, Raymond Yee, Coye Cheshire, and Oded Nov (2011). Gender Differences in Wikipedia Editing. WikiSym 2011: Proceedings of the 7th International Symposium on Wikis, 2011. PDF Open access icon
  2. ^ S.T.K. Lam, A. Uduwage, Z. Dong, S. Sen, D.R. Musicant, L. Terveen, and J. Riedl (2011). WP:Clubhouse? An Exploration of Wikipedia's Gender Imbalance. In WikiSym 2011: Proceedings of the 7th International Symposium on Wikis, 2011. PDF Open access icon
  3. ^ Halfaker, Aaron, Aniket Kittur, and John Riedl (2011). Don't Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. WikiSym '11: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  4. ^ A.G. West and I. Lee (2011). What Wikipedia Deletes: Characterizing Dangerous Collaborative Content. In WikiSym 2011: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  5. ^ Ferron, Michela, and Paolo Massa (2011). Collective memory building in Wikipedia: The case of North African uprisings. WikiSym 2011: Proceedings of the 7th International Symposium on Wikis. PDF Open access icon
  6. ^ Cabunducan, Gerard, Ralph Castillo, and John Boaz Lee (2011). Voting behavior analysis in the election of Wikipedia admins. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, 545–547. IEEE DOI Closed access icon
  7. ^ J. Leskovec, D. Huttenlocher, J. Kleinberg (2010) Predicting positive and negative links in online social networks. ACM WWW International conference on World Wide Web (WWW '10), 2010. video PDF Open access icon
  8. ^ J. Leskovec, D. Huttenlocher, J. Kleinberg (2010) Governance in Social Media: A case study of the Wikipedia promotion process. In: AAAI International Conference on Weblogs and Social Media (ICWSM '10). video PDF Open access icon
  9. ^ Yasseri, Taha, Sumi, Róbert, Kerétsz, János (2011). Circadian patterns of Wikipedia editorial activity: A demographic analysis, ArXiV (September 8, 2011). PDF Open access icon
  10. ^ Reinoso, Antonio J., Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, and Israel Herraiz (2011). Temporal characterization of the requests to Wikipedia. In Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval (DART 2011). ETSI Caminos, Canales y Puertos (UPM), September 13, 2011. PDF Open access icon
  11. ^ Reagle, Joseph, and Lauren Rhue (2011). Gender Bias in Wikipedia and Britannica. International Journal of Communication 5 (2011): 1138–1158. PDF Open access icon
  12. ^ José Felipe Ortega and Joaquín Rodríguez López (2011). El potlatch digital. Wikipedia y el triunfo del procomún y el conocimiento compartido, Catedra, September 2011. HTML Closed access icon
  13. ^ Badgett, Robert G, and Mary Moore (2011). Are students able and willing to edit Wikipedia to learn components of evidence-based practice? Kansas Journal of Medicine 4(3), August 30, 2011. PDF Open access icon
  14. ^ Reagle, Joseph M. (2010). Good Faith Collaboration: The Culture of Wikipedia. The MIT Press, 2010. HTML Open access icon
  15. ^ Liu, J. (2011). W7 model of provenance and its use in the context of Wikipedia. PhD dissertation, The University of Arizona, 2011. PDF Closed access icon
  16. ^ Graham, M., Hale, S. A. and Stephens, M. (2011) Geographies of the World’s Knowledge. Ed. Flick, C. M., London, Convoco! Edition. PDF Open access icon
  17. ^ He, Zeyi (2011). Measuring the Development of Wikipedia. In 2011 International Conference on Internet Technology and Applications, IEEE DOI Closed access icon

















Wikipedia:Wikipedia Signpost/2011-09-26/Recent_research