The Signpost

Recent research

Wikipedia and Sandy Hook; SOPA blackout reexamined

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

How Wikipedia deals with a mass shooting

Northeastern University researcher Brian Keegan analyzed the gathering of hundreds of Wikipedians to cover the Sandy Hook Elementary School shooting in the immediate aftermath of the tragedy. The findings are reported in a detailed blog post that was later republished by the Nieman Journalism Lab.[1] Keegan observes that the Sandy Hook shooting article reached a length of 50Kb within 24 hours of its creation, making it the fastest growing article by length in the first day among recent articles covering mass shootings on the English-language Wikipedia. The analysis compares the Sandy Hook page with six similar articles from a list of 43 articles on shooting sprees in the US since 2007. Among the analyses described in the study, of particular interest is the dynamics of dedicated vs occasional contributors as the article reaches maturity: while in the first few hours contributions are evenly distributed with a majority of single-edit editors, after hour 3 or 4 a number of dedicated editors show up and "begin to take a vested interest in the article, which is manifest in the rapid centralization of the article". A plot of inter-edit time also shows the sustained frequency of revisions that these articles display days after their creation, with Sandy Hook averaging at about 1 edit/minute around 24 hours since its first revision. The notebook and social network data produced by the author for the analysis are available on his website. The Nieman Journalism Lab previously covered the role that Wikipedia is playing as a platform for collaborative journalism, and why its format outperforms Wikinews with an interview of Andrew Lih published in 2010.[2] The early revision history of the Sandy Hook shooting article was also covered in a blog post by Oxford Internet Institute fellow Taha Yasseri, however with a focus on the coverage in different Wikipedia language editions.[3]

Network positions and contributions to online public goods: the case of the Chinese Wikipedia

A graph with nodes color-coded by betweenness centrality (from red=0 to blue=max).

In a forthcoming paper in the Journal of Management Information Systems (presented earlier at HICSS '12[4]), Xiaoquan (Michael) Zhang and Chong (Alex) Wang use a natural experiment to demonstrate that changes to the position of individuals within the editor network of a wiki modify their editing behavior. The data for this study came from the Chinese Wikipedia. In October 2005, the Chinese government suddenly blocked access to the Chinese Wikipedia from mainland China, creating an unanticipated decline in the editor population. As a result, the remaining editors found themselves in a new network structure and, the authors claim, any changes in editor behavior that ensued are likely effects of this discontinuous "shock" to the network. The paper defines each editor as a node (vertex) in the network and a tie (edge) between two editors is created whenever the editors edit the same page in the wiki. They then examine how changes to three aspects of individual editors' relative connectedness (centrality) to other editors within the network altered their subsequent patterns of contribution.

The main finding is that changes in the three kinds of editors' connectedness within the network result in differential changes to their editing behavior. First, an increase in the number of direct connections between one editor and the rest of the network (degree centrality) resulted in fewer edits by that editor, and more work on articles they created. Second, an increase in the overall proximity of an editor to the other members of the network (closeness centrality) resulted in fewer edits and less work on articles they created. Third, an increase in the extent to which an editor connected otherwise isolated groups in the network (betweenness centrality) resulted in more edits and more work by that editor on articles they created. Overall, these results imply that alterations to the network structure of a wiki can change both the quantity and quality of editor contributions. The researchers argue that their findings confirm the predictions of both network game theory and role theory; and that future research should try to analyze the character of the network ties created within platforms for large-scale online collaboration, to better understand how changes to network structure may alter collaborative practices and public goods creation.

Quality of pharmaceutical articles in the Spanish Wikipedia

Ibuprofen, one of the World Health Organisation's "essential drugs", a topic covered in detail by the Spanish-language Wikipedia.

In an online early version of an upcoming article in Atención Primaria,[5] researchers at the Miguel Hernández University of Elche and the University of Alicante have benchmarked articles on pharmaceutical drugs in the Spanish Wikipedia against information available in a pharmaceutical database, Vademécum.[6] A subset of the Vademécum corpus of 3,595 drugs was created using simple random sampling without replacement, consisting of 386 drugs. Of these, 171 (44%) had entries on the Spanish Wikipedia, which were then scrutinized along several dimensions in May 2012. Usage of the drug was correctly indicated in 155 (91%) of these articles, dosage in 26 (15%), and side-effects in 64 (37%), with only 15 articles (9%) scoring well in all of these dimensions. The researchers conclude that, while Wikipedia has a high potential to help with the dissemination of pharmaceutical knowledge, the Spanish-language edition does not currently live up to this potential. As a possible solution, they suggest the pharmaceutical community more actively participate in editing Wikipedia. The list of the drugs involved has not been made public, since a similar study is currently underway whose results may be distorted by targeted intervention. The authors have signalled to this research report their intention to make the list available after this second study is complete.

Wikipedia editing patterns are consistent with a non-finite state model of computation

A paper posted to ArXiv[7] by SFI's Omidyar fellow Simon DeDeo presents evidence for non-finite state computation in a human social system using data from Wikipedia edit histories. Finite state-systems are the basis for the study of formal languages in computer science and linguistics, and many real-world complex phenomena in biology and the social sciences are also studied empirically by assuming the existence of underlying finite-state processes, for the analysis of which powerful probabilistic methods have been devised. However, the question of whether the description of a system truly entails a finite or a non-finite, unbounded number of states, is an open one. This is significant from a functionalist point of view: can we classify a system by its computational properties, and can these properties help us better understand how the system works regardless of its material details?

The paper's contribution lies in its proof of a probabilistic generalization of the pumping lemma, a device used in theoretical computer science as a necessary condition for a language to be described by only a finite number of states. The lemma is applied to the edit histories of a number of the most frequently edited articles in the English Wikipedia, after being properly transformed into coarse-grain sequences of "cooperative" or "non-cooperative (reversion) edits (reverts being identified by means of their SHA1 field). A Bayesian argument is applied to show that the lemma cannot hold for a majority of sequences, thus showing that Wikipedia's collaborative editing system as a whole cannot be described by any aggregation of finite-state systems. The author discusses the implications of this finding for a more grounded study of Wikipedia's editing model, and for the identification of detailed computational models of other social and biological systems.

Wikipedia as our collective memory

A protester on Tahrir Square during the 2011 Egyptian revolution.

Michela Ferron, a member of the SoNet (Social Networking) research group at the Bruno Kessler Foundation in Trento, Italy submitted her PhD thesis[8] in December 2012. She examined the idea of viewing Wikipedia as a venue for collective memory and the language indicators of the dynamic process of memory formation in response to "traumatic" events. Parts of the thesis have already been published in journals and conference proceedings, such as WikiSym 2011 and 2012 (cf. presentation slides).

A full chapter is dedicated to the background on the concept of collective memory and its appearance in the digital world. The thesis continues with an analysis of "anniversary edits", showing a significant increase in editorial activities on articles related to traumatic events during the anniversary period compared to a large random sample of "other" articles. More detailed linguistic indicators are introduced in the next chapter. It is statistically shown that the terms related to affective processes, negative emotions, and cognitive and social processes occur more often in articles on traumatic events; "Specifically, the relative number of words expressing anxiety (e.g., “worried”), anger (e.g., “hate”) and sadness (e.g., “cry”) was significantly higher in articles about traumatic events".

In the next step, Ferron tried to distinguish between human-made and natural disasters. It has been observed that "human-made traumatic events were characterized by language referring to anger and anxiety, while the collective representation of natural disasters expressed more sadness". Finally, a detailed case study of the talk pages of articles on the 7 July 2005 London bombings and the 2011 Egyptian revolution was carried out, and language indicators, especially those related to emotions, were investigated in a dynamic framework and compared for both examples.

SOPA blackout decision analyzed

A First Monday article[9] reviews several aspects of the Wikipedia participation in the 18 January 2012 protests against SOPA and PIPA legislation in the US. The paper focuses on the question of legitimacy, looking at how the Wikipedia community arrived at the decision to participate in those protests.

The English Wikipedia landing page, symbolically its only page during the blackout on January 18, 2012

The paper provides an interesting discussion of legitimacy in Wikipedia's governance, and discusses the legitimacy of the decision to participate in the protests. The author notes that the initiative was given a major boost by Jimmy Wales' charismatic authority, as Wales posted a straw poll about the issue on his talk page on December 10, 2011, as while the issue was discussed by the community beforehand (for example, in mid-November at the Village Pump), those discussions attracted much less attention. It is hard to say whether the protest would have happened without Jimbo's push for more discussion, as it veers towards "what if" territory; as things happened, it is true that Jimbo's actions began a landslide that led to the protests. However, this reviewer is more puzzled at the claim made in the introduction to the article that the discussion involved a "massive involvement of the Wikimedia Foundation staff". While several WMF staffers were active in the discussions in their official capacity, and while the WMF did issue some official statements about the ongoing discussion, the paper certainly does not provide any evidence to justify the word "massive".

The paper subsequently notes that the WMF focused on providing information and gently steering the discussion, without any coercion; this hardly justifies the claim of "massive involvement". At the very least, a clear explanation is necessary of precisely how many WMF staffers participated in the discussion before such a grandiose adjective as "massive" is used. It is true that the WMF staffers helped push the discussion forward, but this reviewer believes that the paper does not sufficiently justify the stress it puts on their participation, and thus may overestimate their influence.

The third part of the paper discusses how the arguments about legitimacy or the lack of it framed the subsequent discourse of the voters. The author notes that after initial period of discussing SOPA itself, the discussion of whether it was legitimate or not for Wikipedia to become involved in the protest took over, with a major justification for it emerging in the form of an argument that it was legitimate for Wikipedia to protest against SOPA as SOPA threatened Wikipedia itself. While this is an interesting claim, unfortunately, other than citing one single comment, no other qualitative or quantitative data are provided; nor is the methodology discussed. We are not told how many individuals voted, how many commented on legitimacy or illegitimacy, how many felt that Wikipedia is threatened; we do not know how the author classified comments supporting any of the viewpoints, or the shifts in the discussion ... this list could unfortunately go on. In one specific example drawn from the conclusion, the author writes that "The main factor that shaped the multi-phased process was the will to have the community accept the final decision as legitimate, and avoid backlash. This factor especially influenced those who are suspected of relying on traditional means of legitimacy such as charisma or professionalism." At the same time, we are provided with no number, no percentage, and certainly no correlation to back up this claim. Without a clear methodology or distinct data it is hard to verify the author's claims and conclusions.

The introduction also notes that "the mass effort of planning an effective political action was not something “anyone [could] edit”" and "the debate preceding the blackout did not follow Wikipedia’s open and anarchic decision-making system"; unfortunately this reviewer finds no justification for those rather strong claims anywhere else in the article.

Overall, this is an interesting paper about legitimacy in Wikipedia, but it seems to overreach when it tries to draw conclusions from the data that is simply not presented to the reader. It suffers from a failure to explain the research's methodology, making verification of the claims made very hard. Due to the lack of hard data, most conclusions are unfortunately rendered dubious, and the paper has a tendency to make strong claims that are not backed up by data or even developed later on.

Bots and collective intelligence explored in dissertation

Rats (blue trace) interacting with a rat-sized robot (red) controlled by a human who in turn perceives the rat's movements through those of a human-sized avatar in a virtual reality environment.[10] The video was uploaded to Wikimedia Commons by the Open Access Media Importer Bot.

In his Communication and Society PhD dissertation,[11] Randall M. Livingstone of the University of Oregon explores the relationship between the social and technical structures of Wikipedia, with a particular focus on bots and bot operators. After a fairly broad literature review (which summarizes the basic approaches to Wikipedia studies from new media theory, social network analysis, science and technology studies, and political economy), Livingstone gives a concise history of the technical development of Wikipedia, from UseModWiki to MediaWiki, and from a single server to hundreds.

The most interesting chapters for Wikipedians will be V – Wikipedia as a Sociotechnical System – and VI – Wikipedia as Collective Intelligence. Chapter 5 looks at the ways the editing community and the evolution of software (both MediaWiki and the semi-automated tools and bots that interact with editors and articles) "construct" each other. Based on 45 interviews with bot operators and WMF staff, this chapter gives an interesting and varied picture of how Wikipedia works as a sociotechnical system. It will in part be a familiar account to the more tech-minded Wikipedians, but offers an accessible overview of bots and their place in the ecosystem to editors who normally steer clear of bots and software development. Chapter 6 looks at theories of intelligence and the concept of collective intelligence, arguing that Wikipedia exhibits (at least to some extent) the key traits of stigmergy, distributed cognition, and emergence.

Briefly

Notes

  1. ^ Keegan, B. (2012). How does Wikipedia deal with a mass shooting? A frenzied start gives way to a few core editors. Nieman Journalism Lab HTML Open access icon
  2. ^ Seward, Z.M. (2012) Why Wikipedia beats Wikinews as a collaborative journalism project. Nieman Journalism Lab HTML Open access icon
  3. ^ Yasseri, T. (2012) The coverage of a tragedy. Stories for Sunday morning HTML Open access icon
  4. ^ Wang, C. (Alex), & Zhang, X. (Michael). (2012). Network Centrality and Contributions to Online Public Good–The Case of Chinese Wikipedia. 2012 45th Hawaii International Conference on System Sciences (pp. 4515–4524). IEEE. DOI Closed access icon
  5. ^ López Marcos, P.; Sanz-Valero, J. (2012). "Presencia y adecuación de los principios activos farmacológicos en la edición española de la Wikipedia". Atención Primaria. 45 (2): 101–106. doi:10.1016/j.aprim.2012.09.012. PMID 23159792. S2CID 196366011. Closed access icon
  6. ^ "Vademécum". UBM Medica Spain S.A. Archived from the original on 30 December 2012. Retrieved 30 December 2012.
  7. ^ DeDeo, S. (2012). Evidence for Non-Finite-State Computation in a Human Social System. ArXiV. PDF Open access icon
  8. ^ Ferron, M. (2012, December 7). Collective Memories in Wikipedia. PhD Thesis, University of Trento. PDF Open access icon
  9. ^ Oz, A. (2012). Legitimacy and efficacy: The blackout of Wikipedia. First Monday, 17(12). HTML Open access icon
  10. ^ Normand, J. M.; Sanchez-Vives, M. V.; Waechter, C.; Giannopoulos, E.; Grosswindhager, B.; Spanlang, B.; Guger, C.; Klinker, G.; Srinivasan, M. A.; Slater, M. (2012). De Polavieja, Gonzalo G (ed.). "Beaming into the Rat World: Enabling Real-Time Interaction between Rat and Human Each at Their Own Scale". PLOS ONE. 7 (10): e48331. Bibcode:2012PLoSO...748331N. doi:10.1371/journal.pone.0048331. PMC 3485138. PMID 23118987. Open access icon
  11. ^ Randall M. Livingstone: Network of Knowledge: Wikipedia as a Sociotechnical System of Intelligence. PDF Open access icon
  12. ^ Medeiros, J. (2012). Infographic: History’s most influential people, ranked by Wikipedia reach. Wired UK. HTML Open access icon
  13. ^ Purcell, K., Rainie, L., Heaps, A., Buchanan, J., Friedrich, L., Jacklin, A., Chen, C., Zickuhr, K. (2012): How Teens Do Research in the Digital World. Pew Internet HTML Open access icon
  14. ^ a b Ermann, L., Frahm, K. M., & Shepelyansky, D. L. (2012). Spectral properties of Google matrix of Wikipedia and other networks. ArXiv PDF Open access icon
  15. ^ Dobusch, L., & Müller-Seitz, G. (2012). Serial Singularities: Developing a Network Organization by Organizing Events. Schmalenbach Business Review, 64, 204–229. HTML Open access icon
  16. ^ Yun, Q., & Gloor, P. A. (2012). The Web Mirrors Value in the Real World – Comparing a Firm’s Valuation with Its Web Network Position. SSRN Electronic Journal. DOI Open access icon
  17. ^ Morgan, J. T., Bouterse, S., Stierch, S., & Walls, H. (2013). Tea & Sympathy: Crafting Positive New User Experiences on Wikipedia. CSCW ’13. PDF Open access icon
  18. ^ a b Florin, F., Taraborelli, D., Keyes, O. (2012). Article Feedback: New research and next steps. Wikimedia blog HTML Open access icon
  19. ^ Morell, M. F. (2013). Good Faith Collaboration: The Culture of Wikipedia. Information, Communication & Society, 16(1), 146–147. DOI Closed access icon
  20. ^ Baker, E. (2012). Measuring the Impact of Wikipedia for organisations (Part 1), Ed's blog, HTML Open access icon

















Wikipedia:Wikipedia Signpost/2012-12-31/Recent_research