The Signpost

Recent research

Bursty edits; how politics beat religion but then lost to sports; notability as a glass ceiling

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Burstiness in Wikipedia editing

Reviewed by Brian Keegan

Wikipedia pages are edited with varying levels of consistency: stubs may only have a dozen or fewer revisions and controversial topics might have more than 10,000 revisions. However, this editing activity is not evenly spaced out over time either: some revisions occur in very quick succession while other revisions might persist for weeks or months before another change is made. Many social and technical systems exhibit "bursty" qualities of intensive activity separated by long periods of inactivity. In a pre-print submitted to arXiv, a team of physicists at the Belgian Université de Namur and Portuguese University of Coimbra examine this phenomenon of "burstiness" in editing activity on the English Wikipedia.[1]

The authors use a database dump containing the revision history until January 2010 of 4.6 million English Wikipedia pages. Filtering out pages and editors with fewer than 2000 revisions, bots, and edits from unregistered accounts, the paper adopts some previously-defined measures of burstiness and cyclicality in these editing patterns. The measures of editors' revisions' burstiness and memory fall outside of the limits found in prior work about human dynamics, suggesting different mechanisms are at work on Wikipedia editing than in mobile phone communication, for example.

Using a fast Fourier transform, the paper finds the 100 most active editors have signals occurring at a 24-hour frequency (and associated harmonics) indicating they follow a circadian pattern of revising daily as well as differences by day of week and hour of day. However, the 100 most-revised pages lack a similar peak in the power spectrum: there is no characteristic hourly, daily, weekly, etc. revision pattern. Despite these circadian patterns, editors' revision histories still show bursty patterns with long-tailed inter-event times across different time windows.

The paper concludes by arguing, "before performing an action, we must overcome a “barrier”, acting as a cost, which depends, among many other things, on the time of day. However, once that “barrier” has been crossed, the time taken by that activity no longer depends on the time of day at which we decided to perform it. ... It could be related to some sort of queuing process, but we prefer to see it as due to resource allocation (attention, time, energy), which exhibits a broad distribution: shorter activities are more likely to be executed next than the longer ones."

Reviewed by Brian Keegan

Google Trends is widely used in academic research to model the relationship between information seeking and other social and behavioral phenomenon. However, Wikipedia pageview data can provide a superior – if underused – alternative that has attracted some attention for public health and economic modeling, but not to the same extent as Google Trends. The authors cite the relative openness of Wikipedia pageview data, the semantic disambiguation, and absolute counts of activity in contrast to Google Trends' closed API, semantic ambiguity of keywords, and relative query share data. However, Trends data (at a weekly level) does go back to 2004, while pageview data (at an hourly level) is only available from 2008.

In a peer-reviewed paper published by PLoS ONE, a team of physicists perform a variety of time series analyses to evaluate changes in attention around the "big data" topic of Hadoop.[2] Defining two key constructs of relevance and representation based on the interlanguage links as well as hyperlinks to/from other concepts, they examine changes in these features over time. In particular, changes in the articles' content and attention occurred in concert with the release of new versions and the adoption of the technology by new firms.

The time series analyses (and terms used to refer to them) will be difficult for non-statisticians to follow, but the paper makes several promising contributions. First, it provides a number of good critiques of research relying exclusive on Google Trends data (outlined above). Second, it provides some methods for incorporating behavioral data from strongly related topics and examining these changes over time in a principled manner. Third, the paper examines behavior across multiple languages editions rather than focusing solely on the English Wikipedia. The paper points to ways in which Wikipedia is an important information sources for tracking publication and recognition of new topics.

"Hidden revolution of human priorities: An analysis of biographical data from Wikipedia"

Reviewed by Piotr Konieczny

This paper[3] data mines Wikipedia's biographies, focusing on individuals' longevity, profession and cause of death. The authors are not the first to observe that the majority of Wikipedia biographies are about sportspeople (half of them soccer players), followed by artists and politicians. But they do make some interesting historical observations, such as that the sport rises only in the 20th century (particularly from the 1990s), that politics surpassed religion in the 13th century, until it was surpassed by sport, and so on. The authors divide the biographies into public (politicians, businessmen, religion) and private (artists and sportspeople) and note that it was only in the last few decades that the second group started to significantly outnumber the first; they conclude that this represents a major shift in societal values, which they refer to as "hidden revolution in human priorities". It is an interesting argument, though the paper is unfortunately completely missing the discussion of some important topics, such as the possible bias introduced by Wikipedia's notability policies.

"Women through the glass-ceiling: gender asymmetries in Wikipedia"

Reviewed by Piotr Konieczny

This paper[4] looks into gender inequalities in Wikipedia articles, presenting a computational method for assessing gender bias in Wikipedia along several dimensions. It touches on a number of interesting questions, such as whether the same rules are used to determine whether women and men are notable; whether there is linguistic bias, and whether articles about men and women have similar structural properties (e. g., similar meta-data, and network properties in the hyperlink network).

They conclude that notability guidelines seem to be more strictly enforced for women than for men, that linguistic bias exists (ex. one of the four words most strongly associated with female biographies is "husband", whereas such family-oriented words are much less likely to be found in biographies of male subjects), and that as the majority of biographies are about men and men tend to link more to men than to women, this lowers visibility of female biographies (for example, in search engines like Google). The authors suggest that Wikipedia community should consider lowering notability requirements for women (controversial), and adding gender-neutral language requirements to the Manual of Style (a much more sensible proposal).

Briefly

Wikipedia influences medical decisionmaking in acute and critical care

Reviewed by Tilman Bayer

A survey[5] of 372 anesthesists and critical care providers in Austria and Australia found that "In order to get a fast overview about a medical problem, physicians would prefer Google (32%) over Wikipedia (19%), UpToDate (18%), or PubMed (17%). 39% would, at least sometimes, base their medical decisions on non peer-reviewed resources. Wikipedia is used often or sometimes by 77% of the interns, 74% of residents, and 65% of consultants to get a fast overview of a medical problem. Consulting Wikipedia or Google first in order to get more information about the pathophysiology, drug dosage, or diagnostic options in a rare medical condition was the choice of 66%, 10% or 34%, respectively." (A 2012 literature review found that "Wikipedia is widely used as a reference tool" among clinicians.)

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

Papers about medical content on Wikipedia and its usage

Papers analyzing community processes and policies

Papers about visualizing or mining Wikipedia content

References

  1. ^ Gandica, Yerali; Carvalho, Joao; Aidos, Fernando Sampaio Dos; Lambiotte, Renaud; Carletti, Timoteo (2016-01-05). "On the origin of burstiness in human behavior: The wikipedia edits case". arXiv:1601.00864 [physics.soc-ph].
  2. ^ Kämpf, Mirko; Tessenow, Eric; Kenett, Dror Y.; Kantelhardt, Jan W. (2015-12-31). "The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks". PLOS ONE. 10 (12): e0141892. Bibcode:2015PLoSO..1041892K. doi:10.1371/journal.pone.0141892. PMC 4699901. PMID 26720074.
  3. ^ Reznik, Ilia; Shatalov, Vladimir (February 2016). "Hidden revolution of human priorities: An analysis of biographical data from Wikipedia". Journal of Informetrics. 10 (1): 124–131. doi:10.1016/j.joi.2015.12.002. ISSN 1751-1577. Closed access icon
  4. ^ Wagner, Claudia; Graells-Garrido, Eduardo; Garcia, David (2016-01-19). "Women through the glass ceiling: Gender asymmetries in Wikipedia". EPJ Data Science. 5. arXiv:1601.04890. doi:10.1140/epjds/s13688-016-0066-4. S2CID 256239395. Jupyter notebooks
  5. ^ Rössler, B.; Holldack, H.; Schebesta, K. (2015-10-01). "Influence of wikipedia and other web resources on acute and critical care decisions. a web-based survey". Intensive Care Medicine Experimental. 3 (Suppl 1): –867. doi:10.1186/2197-425X-3-S1-A867. ISSN 2197-425X. S2CID 19754943. (Poster presentation)
  6. ^ Devraj, Nikhil; Chary, Michael (2015). "How Do Twitter, Wikipedia, and Harrison's Principles of Medicine Describe Heart Attacks?". Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. BCB '15. New York, NY, USA: ACM. pp. 610–614. doi:10.1145/2808719.2812591. ISBN 978-1-4503-3853-0.
  7. ^ Brigo, Francesco; Otte, Willem M.; Igwe, Stanley C.; Ausserer, Harald; Nardone, Raffaele; Tezzon, Frediano; Trinka, Eugen (2015). "Information-seeking behaviour for epilepsy: An infodemiological study of searches for Wikipedia articles". Epileptic Disorders. 17 (4): 460–466. doi:10.1684/epd.2015.0772. PMID 26575365.
  8. ^ Brigo, Francesco; Igwe, Stanley C.; Nardone, Raffaele; Lochner, Piergiorgio; Tezzon, Frediano; Otte, Willem M. (July 2015). "Wikipedia and neurological disorders". Journal of Clinical Neuroscience. 22 (7): 1170–1172. doi:10.1016/j.jocn.2015.02.006. ISSN 1532-2653. PMID 25890773. S2CID 25821260.
  9. ^ Choi-Lundberg, Derek L.; Low, Tze Feng; Patman, Phillip; Turner, Paul; Sinha, Sankar N. (2015-05-01). "Medical student preferences for self-directed study resources in gross anatomy". Anatomical Sciences Education. 9 (2): 150–160. doi:10.1002/ase.1549. ISSN 1935-9780. PMID 26033851. S2CID 23191. Closed access icon
  10. ^ Matei, Sorin Adam; Foote, Jeremy (2015). "Transparency, Control, and Content Generation on Wikipedia: Editorial Strategies and Technical Affordances". In Sorin Adam Matei; Martha G. Russell; Elisa Bertino (eds.). Transparency in Social Media. Computational Social Sciences. Springer International Publishing. pp. 239–253. doi:10.1007/978-3-319-18552-1_13. ISBN 978-3-319-18551-4. Closed access icon
  11. ^ Sandrine Cristina de Figueirêdo Braz, Edivanio Duarte de Souza: Políticas para produção de conteúdos na Wikipédia, a enciclopédia livre (Policies For The Production Of Contents In The Wikipedia, The Free Encyclopedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais ... Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  12. ^ Marcio Gonçalves, Clóvis Montenegro de Lima: Pretensões de validade da informação diante da autoridade do argumento na wikipédia (Validity claims of information in face of authority of the argument on wikipedia). In: ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais ... Belo Horizonte: UFMG, 2014. PDF (in Portuguese, with English abstract)
  13. ^ Phillips, Murray G. (2015-10-07). "Wikipedia and history: a worthwhile partnership in the digital era?". Rethinking History. 20 (4): 523–543. doi:10.1080/13642529.2015.1091566. ISSN 1364-2529. S2CID 143213332. Closed access icon
  14. ^ Yiwei Zhou, Alexandra I. Cristea and Zachary Roberts: Is Wikipedia really neutral? A sentiment perspective study of war-related Wikipedia articles since 1945. 29th Pacific Asia Conference on Language, Information and Computation pages 160–68. Shanghai, China, October 30 – November 1, 2015 PDF
  15. ^ Menking, Amanda; Erickson, Ingrid (2015). "The heart work of Wikipedia: gendered, emotional labor in the world's largest online encyclopedia". Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. CHI '15. New York, NY, USA: ACM. pp. 207–210. doi:10.1145/2702123.2702514. ISBN 978-1-4503-3145-6. Closed access icon , also as draft version on Wikimedia Commons
  16. ^ Zhan, Liuhan; Wang, Nan; Shen, Xiao-Liang; Sun, Yongqiang (2015-01-01). "Knowledge quality of collaborative editing in Wikipedia: an integrative perspective of social capital and team conflict". PACIS 2015 Proceedings.
  17. ^ Hai-Jew, Shalin (2016). "Visualizing Wikipedia Article and User Networks". Developing Successful Strategies for Global Policies and Cyber Transparency in E-Learning. Advances in Educational Marketing, Administration, and Leadership. pp. 60–81. doi:10.4018/978-1-4666-8844-5.ch005. ISBN 9781466688445.
  18. ^ Qureshi, Muhammad Atif (2015-10-08). Utilising Wikipedia for text mining applications (Thesis). (PhD thesis, U Galway)
  19. ^ Chu, Chenhui; Nakazawa, Toshiaki; Kurohashi, Sadao (December 2015). "Integrated parallel sentence and gragment extraction from comparable corpora: a case study on Chinese-Japanese Wikipedia". ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15 (2). doi:10.1145/2833089. hdl:2433/265843. ISSN 2375-4699. S2CID 18363124. Closed access icon
  20. ^ Halatchliyski, Iassen; Cress, Ulrike (2014-11-03). "How structure shapes dynamics: knowledge development in Wikipedia – a network multilevel modeling approach". PLOS ONE. 9 (11): e111958. Bibcode:2014PLoSO...9k1958H. doi:10.1371/journal.pone.0111958. PMC 4218828. PMID 25365319.



















Wikipedia:Wikipedia Signpost/2016-01-27/Recent_research