The Signpost

Recent research

Gender gap and skills gap; academic citations on the rise; European food cultures

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia"

This article[1] contributes to the discussion on gender inequalities on Wikipedia. The authors take a novel approach of looking for answers outside the Wikipedia community, thus also tying their research into the analysis of new editors recruitment, motivations, and barriers to contribute. The authors focus their analysis on the role of Internet experiences and skills, and their lack among certain groups. The authors study whether the level of one's skills in digital literacy is related to their chance of becoming a Wikipedia editor, by surveying 547 young adults (aged 21–22) – students at a (presumably American) university, the most used convenience sample in academia. The survey was carried out in 2009, with a follow-up wave in 2012. The students were asked about their socioeconomic and demographic background, as well as about their level of digital literacy skills. The authors report that "the average respondent's confidence in editing Wikipedia is relatively low" but that "about one in eight students had been given an assignment in class at some point either to edit or create a new entry on Wikipedia" – which likely suggests that the (undisclosed by authors) university was one where at least one member of the faculty participated in the Wikipedia:Education Program. The vast majority (99%) of respondents reported having read an entry on Wikipedia, and over a quarter (28%) have had some experience editing it (interestingly, even when controlling for students who were assigned to edit Wikipedia, the former number is still as high as 20%).

Regarding the gender gap issues, women are much less likely to have contributed to Wikipedia than men (21% to 38%), and that becomes even more divergent when controlling for student assignments (13% to 32%). The authors find an indication of gender gap affecting the likelihood of Wikipedia's contributions: students who are white, economically affluent, male and Internet-experienced are more likely to edit than others. The strongest and statistically significant predictor variables, however, are Internet skills and gender, and regression models show that variables such as race, ethnicity, socioeconomic status, time availability, Internet experience, and confidence in editing Wikipedia are not significant. The authors find that the gender becomes more significant as one's digital literacy increases. At a low level of Internet skills, the likelihood of one's contribution to Wikipedia is low, regardless of gender. As one's skills increase, males became much more likely to contribute, but women fall behind. The authors find that women tend to have lower Internet skills than men, which helps explain a part of the Wikipedia gender gap: to contribute to Wikipedia, one needs to have a certain level of digital literacy, and the digital gap is reducing the number of women who have the required level of skills. The authors crucially admit that "why women, on average, report lower level understanding of Internet-related terms remains a puzzle. Although studies with detailed data about actual skills based on performance tests suggest no gender differences in the observed skills, research that looks at self-rated know-how consistently finds gender variation with real consequences for online behavior". This suggests that while men and women have, in reality, similar skills, women are much less confident about them, which in turns makes them much less confident about contributing to (or trying to contribute to) Wikipedia. This, however, is a hypothesis to be confirmed by future research. In the end, the authors do feel confident enough to conclude that "gender and Internet skills likely have a relatively mild interaction with each other, reinforcing the gender gap at the high end of the Internet skills spectrum." In conclusion, this reviewer finds this study to be a highly valuable one, both for the literature on gender gap and online communities, and for the Wikipedia community and WMF efforts to reduce this gap in our environment.

In nutritional articles, academic citations rise while news media citations decrease

A study published in First Monday[2] analyzed the development of the referencing of 45 articles over nine topic groups related to health and nutrition over a period of five years (2007–2011) (unfortunately, the authors are not very clear on which particular articles were analyzed, and tend to use the concepts of an article and topic group in a rather confusing manner). Authors coded for references (3,029 total), information on editing history, and search ranking in Google, Bing and Yahoo! search engines. The study confirmed that Wikipedia articles are highly ranked by all search engines, with Yahoo! actually being even more "Wikipedia-friendly" than Google. The author shows that (as expected) the articles improve in quality (or at least, number and density of references) over time. Crucially, the authors show that the overall percentage of mainstream news media references has decreased, while references to academic publications increased over that time. By the end of the study period, only the article on (or topic group of?) trans fat contained more references to news sources than to academic publications. The authors overall support the description of Wikipedia as a source aiming for reliability, though they are hesitant to call it reliable, pointing out that for example 15% of analyzed references were coded as "outside the main reference type categories or... not be clearly determined". The authors conclude, commendably, that "Wikipedia needs to be high on the agenda for health communication researchers and practitioners" and that "communications professionals in the health field need to be much more actively involved in ensuring that the content on Wikipedia is reliable and well-sourced with reliable references".

Wikipedia user session timing compared with other online activities

Comparison of time between user interactions on Wikipedia, AOL and Cyclopath
reviewed by Maximilianklein (talk)

In a recent preprint titled "User Session Identification Based on Strong Regularities in Inter-activity Time"[3], Halfaker and team from the Wikimedia Foundation's Analytics department and the GroupLens Lab ask whether there is some way we can talk about contributions in terms of "sessions" rather than atomic operations, in all collaborative work online. The researchers would like to answer "yes," and that a "session" can be defined as the operations conducted until "a good rule-of-thumb inactivity threshold of about 1 hour" is reached, regardless if you're editing Wikipedia, viewing Wikipedia, rating movies, searching AOL, or playing League of Legends. You may recall that Halfaker and Geiger came to a similar conclusion about "edit sessions" in a 2013 paper, but now the idea is to cement that fact as a universal heuristic across many domains. Opposition to this idea has been that session length thresholds will always be arbitrary, or that a session deviates from completing a task that might extend beyond someone logging off for a night.

Stack Overflow user interactions

To bolster their argument, the authors use empirical data collected from seven datasets to test the hypothesis. The method employed is to take the log-normal time between user events, and then fit a bimodal distribution to the histogram. Once we have a two-humped histogram, we simply find the point which makes half the data "within" session and the other half "between" session.

AOL search data, Cyclopath route-getting requests, and Wikipedia viewing (from the desktop, mobile and apps) seem to fit bimodally. Together their the threshold is in the range of 29 to 115 minutes, but all would not be far off of an hour, say the authors. Yet when it comes to Wikipedia editing, OpenStreetMap editing, and MovieLens reviewing and searching, a bimodal 1-hour fit is good, but can be further explained by a trimodal model. In the case of the first two activities the third category is the wikibreak, and in the latter it is the ease the site make in rating movies in quick succession.

Even trimodally though, "this strategy for identifying session thresholds is not universally suitable for all user-initiated events". For instance they show League of Legends, which has modal peaks at 5 minutes and one day. As a reviewer this is easy to describe from a player's perspective. If you play 5 games in a row, which takes 5 minutes queueing between games, and then repeat it daily, you get the histogram seen where the 5 minute peak is about 5 times as tall as the day peak. Stack Overflow does not easily fit into their model at all with a threshold of 335 minutes. The authors claim this is from the high quality edits expected at Stack Overflow.

Overall the authors conclude that one hour seems to suffice as a rule of thumb. But does it? The issue is that a goodness of fit with the bimodal models is not presented. This leaves outliers like Stack Overflow either able to be modeled but not compliant with the one hour rule, when they could just potentially not be describable using the proposed heuristic.

Briefly

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

References

  1. ^ Hargittai, Eszter; Aaron Shaw (2014-11-04). "Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia". Information, Communication & Society. 0 (0): 1–19. doi:10.1080/1369118X.2014.957711. ISSN 1369-118X. Closed access icon
  2. ^ Messner, Marcus; Marcia W. DiStaso; Yan Jin; Shana Meganck; Scott Sherman; Sally Norton (2014-10-29). "Influencing public opinion from corn syrup to obesity: A longitudinal analysis of the references for nutritional entries on Wikipedia". First Monday. 19 (11). ISSN 1396-0466.
  3. ^ Halfaker, Aaron; Oliver Keyes; Daniel Kluver; Jacob Thebault-Spieker; Tien Nguyen; Kenneth Shores; Anuradha Uduwage; Morten Warncke-Wang (2014-11-11). "User Session Identification Based on Strong Regularities in Inter-activity Time". arXiv:1411.2878.
  4. ^ Patryk Korzeniecki: Ruch Wikimediów w państwach europejskich jako przykład aktywności obywatelskiej (Wikimedia Movement in European countries as an example of civil participation). Chapter 6 in: Joachim Osiński, Joanna Zuzanna Popławska (eds.): Oblicza spoleczenstwa obywatelskiego. WARSAW SCHOOL OF ECONOMICS PRESS, WARSAW 2014
  5. ^ Riddell, Allen B. (2014-11-08). "Public Domain Rank: Identifying Notable Individuals with the Wisdom of the Crowd". arXiv:1411.2180.
  6. ^ Laufer, Paul; Claudia Wagner; Fabian Flöck; Markus Strohmaier (2014-11-17). "Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures". arXiv:1411.4484.
  7. ^ Ferschke, Oliver (2014-07-15). "The Quality of Content in Open Online Collaboration Platforms: Approaches to NLP-supported Information Quality Management in Wikipedia". Darmstadt: Technische Universität Darmstadt. {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ Emily K. Jamison, Iryna Gurevych: Adjacency Pair Recognition in Wikipedia Discussions using Lexical Pairs. PDF
  9. ^ Johannes Daxenberger and Iryna Gurevych: Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia [ http://acl2014.org/acl2014/P14-2/pdf/P14-2031.pdf PDF] Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 187–192, Baltimore, Maryland, USA, June 23-25 2014.
  10. ^ Spychała, Justyna; Mateusz Adamczyk; Piotr Turek (2014-06-30). "Does the Administrator Community of Polish Wikipedia Shut out New Candidates Because of the Acquaintance Relation?". International Journal On Advances in Intelligent Systems. 7 (1 and 2): 103–112. ISSN 1942-2679.
  11. ^ Ubah, Ifeanyichukwu (2013). Development of a semantic data collection tool. : The Wikidata Project as a step towards the semantic web.
  12. ^ Hilles, Stefanie (2014). "To Use or Not to Use? The Credibility of Wikipedia". Public Services Quarterly. 10 (3): 245–251. doi:10.1080/15228959.2014.931204. ISSN 1522-8959. Closed access icon
  13. ^ Tran, Giang Binh; Mohammad Alrifai (2014). "Indexing and Analyzing Wikipedia's Current Events Portal, the Daily News Summaries by the Crowd" (PDF). Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. WWW Companion '14. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee. pp. 511--516. doi:10.1145/2567948.2576942. ISBN 978-1-4503-2745-9. (ACM)

















Wikipedia:Wikipedia Signpost/2014-11-26/Recent_research