The Signpost

Recent research

Special issue on gender gap and gender bias research


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

This month's edition focuses on recent research about Wikipedia's gender gaps and potential gender biases.

Female and nonwhite US sociologists less likely to have Wikipedia articles than scholars of similar citation impact

Reviewed by Aaron Shaw

In "Who Counts as a Notable Sociologist on Wikipedia? Gender, Race, and the 'Professor Test'",[1] Julia Adams, Hannah Brückner, and Cambria Naslund analyze articles about contemporary U.S. sociologists on English Wikipedia to investigate whether demographic gaps in coverage exist. The paper, published in the open access journal Socius: Sociological Research for a Dynamic World, reports evidence of race and gender-based coverage gaps, even among similarly accomplished faculty in terms of seniority, institutional status, publication count, and H-index. The authors analyze a sample of nearly 3,000 sociology faculty in elite ("R1") research universities in the United States in 2014. They then gather demographic, institutional affiliation, and citation data for all of these individuals and match the names of as many as they can to articles in the "living sociologists" category on English Wikipedia as of October 2016. In supplementary analysis, the authors also collect data on "articles for deletion" (AfD) decisions impacting any matches they can identify from their sample. They compare coverage and deletion rates across categories of race, gender, and other variables.

The results indicate that female and nonwhite scholars are disproportionately less likely to have articles covering them than scholars of similar seniority, prestige, or citation impact who are male and/or white. The analysis finds no evidence of differential deletion across male and female sociologists, indicating that any coverage gaps (at least along the lines of gender) derive from different rates of article creation. The authors offer two potential explanations for these findings, which they describe as "supply" and "demand" side explanations respectively. On the supply side, Adams and colleagues argue that differential coverage patterns may reflect broader patterns of underrepresentation of women and people of color in the world. According to this perspective, Wikipedia's gaps in coverage could derive, at least partly, from inequities that originate elsewhere. On the demand side, however, Adams and colleagues note that some portion of the coverage gaps also seem likely to derive from issues with gatekeeping and exclusionary behavior within the community. Specifically, they identify Wikipedia's "professor test" notability policy as one of "the factors that transmit existing inequalities into the encyclopedia and magnify them informationally":

The Wikipedia criteria for notability of academics include references to making a 'significant impact on the field' and being an 'elected member of a highly selective and prestigious scholarly society or association' or the 'highest-level elected or appointed administrative post at a major academic institution,' [...] There are gendered and racialized patterns and criteria already embedded in these judgments. Some of the highly prestigious academic societies overwhelmingly elect white men into their ranks [...] for example ...

While the paper points to some evidence in support of both supply and demand perspectives, the authors conclude that some elements of both explanations seem likely and leave it to future work to disentangle them.

This article is one of only a few studies published in an American Sociological Association affiliated journal that focuses on Wikipedia as an empirical context. It breaks new ground in the understanding of content coverage gaps on Wikipedia by focusing on a specific profession for which relatively clear indicators of notability exist. While, as the authors point out, measures like citation count and H-index and classifications of institutional status are limited in many ways, they provide clear empirical grounds upon which to compare scholars who may or may not have received coverage on Wikipedia. The evidence points clearly to persistent and disproportionate inequities, suggesting that initiatives to support more equitable coverage may be important correctives.

See also related Signpost coverage from 2014: "Wikipedia study cited as example of government waste [by a Republican politician]"


"Mapping and Bridging the Gender Gap: An Ethnographic Study of Indian Wikipedians and Their Motivations to Contribute"

Reviewed by Lucie-Aimée Kaffee

In the extended abstract,[2] the authors Anwesha Chakraborty and Netha Hussain investigate what barriers occur for women editors to contribute to Wikipedia, with a focus on Indian editors. In their ethnographic study, they interview 5 editors (4 female, 1 male editor), and identify challenges and suggest solutions to the problem of the gender gap in the Indian editors.

The authors identify a set of barriers, particularly for Indian women editors, which they describe as “socio-cultural challenges which hinder participation of women not only from contributing to Wikipedia but also accessing the internet”.

Among the motivations to contribute are to democratize the knowledge, share knowledge in the editors’ native language, increase the number of articles about India in the Wikiverse and bridging the gap between oral and written knowledge.

The suggestions to increase the number of Indian women editors include in-person meetings for new and existing editors, sensitization to be a more inclusive, the outreach to mothers and simplifying the editor interface for people with lack of access to devices.


"Investigating the Gender Pronoun Gap in Wikipedia"

Reviewed by Khandaker Tasnim Huq

This article[3] is about addressing the issue of gender bias in Wikipedia from a quantitative perspective. It proposes a simple metric, known as "Gender Pronoun Gap", which is the ratio between the number of times the pronouns "he" and "she" are used in any given article. It investigates simply whether the word "he" occurs more in any article and vice versa. There is a firm motivation for this study: Wikipedia has become an important source of data for artificial intelligence and machine learning algorithms development. These data driven algorithms will badly suffer from stereotype reinforcements if the data source is at risk of biases. For investigation, the research scientists from Qualcomm Institute at the University of California, San Diego used two Wikipedia corpora: (a) 25,951 articles verified as "Good" or "Featured" by editorial scrutiny on Wikipedia, provided by "The WikiText Long Term Dependency Language Modeling Dataset" by MetaMind (Merity, Xiong, Bradbury, & Socher, 2016), (b) 463,820 English Wikipedia articles with 20 or more daily page views from "The Unknown Perils of Mining Wikipedia" by Lateral (Wilson, 2017).

The study revealed that the articles contain biases towards "he" words. The authors write:

... the bias increases (that is worsens) as we change our corpus from general popular Wikipedia articles (captured by Lateral) to the 'Good' and 'Featured' articles captured by MetaMind. This further suggests that the editor process of an article to be 'Good' or 'Featured' introduces additional bias...

That means the "Featured" category contains significantly less "Equal" or unbiased articles than "All" category. One reason suggested by the researchers is that unbiased articles tend to focus more on abstract ideas or technical topics, e.g., scientific developments or mathematical theorems, than on individuals and their background. General audiences have a harder time reading these articles. This makes it less likely for these articles to be included in the "Featured" list.

The researchers also applied a topic modeling method, called "Latent Semantic Indexing" or LSI, to study which pronoun tends to be seen in which particular topics, by visualizing the distribution and organization of the topics among the articles. The study suggests some interesting facts. In musician- and music-related articles, for example, although there are less "she"-heavy articles (containing more "she” pronouns) than "he"-heavy ones (vice-versa), the editors use very similar language and vocabulary for both. But for athlete-related articles, there are far less articles about female athletes than males, hindering the fair representation of the female in the athletic world. The researchers also found an outlier in "she"-heavy articles, due to articles related to battleships, which are often referred to as "she" instead of "it" in naval vernacular. This finding indicates that some articles, especially "she"-heavy ones, are about objects yet still contain a pronoun that is primarily used to refer to a person, which could be undesirable and misleading. The researchers acknowledge some limitations of their proposed model: for example, the analysis of the pronouns was projected only on English Wikipedia and the articles which are structured with the binary gender pronouns. They admit that one metric is not enough to capture the bias as a whole as the gender bias is far more complex to quantify in terms of only the usage of pronouns in articles. Therefore, the author suggests more studies on the quest for more quantifiable and efficient metrics which may provide a better understanding of gender bias in Wikipedia.


Safety and women editors on Wikipedia

Reviewed by Isaac Johnson

In the CHI 2019 paper "People Who Can Take It: How Women Wikipedians Negotiate and Navigate Safety" by Amanda Menking, Ingrid Erickson, and Wanda Pratt,[4] the gender gap is examined through interviews with 25 experienced women editors on Wikipedia (purposefully including several women who were more dismissive of a relationship between their gender identity and experience on-wiki). Menking et al. approach the gender gap not from the standpoint of skills or abilities but how a lack of safety within Wikipedia spaces for women creates barriers to participation. By interviewing experienced editors, they are able to not just identify issues, but also highlight coping strategies that these women have created to deal with issues of safety on Wikipedia.

A couple of excellent points made by this paper:

  • They highlight the importance of studying not just Wikipedia namespaces but all of the communication and spaces outside of Wikipedia that inform one's relationship with the rest of the community (IRC, edit-a-thons, conferences, mailing lists, etc.). This may be a less salient point for those studying new editors but is an incredibly important to remember when considering the experiences of more long-term editors.
  • Many of the women interviewed had found ways to cope with any lack of safety they felt on-wiki, but many of these strategies are not sanctioned. This highlights that though many women do stick around and edit, this continued participation does not necessarily indicate that the design and norms within Wikipedia are supportive of them.
  • The lack of safety can lead women to choose to avoid certain spaces (e.g., editing articles that are particularly contentious). This clearly could have unfortunate consequences regarding the diversity of voices that contribute but also is a reminder to researchers to not interpret behavior on-wiki as purely self-selection. It is a product of the environment too and, at least for women editors, might look much different if the spaces were perceived as being safer. From the paper:

[t]heir decision not to edit has less to do with the content of the articles themselves and their skills and knowledge in relation to those topics, but rather to do with the culture of Wikipedia and their sense of safety.

  • The paper closes with three provocations for design – noting that designers of online and offline spaces need to be intentional about 1) designing for safety, 2) creating tools that allow individuals to create safe spaces within these communities, and 3) not putting the burden of creating safety on those who are facing these barriers.

I highly encourage reading this paper in full though – I have only highlighted a few of the points contained within. Notably, it has a very well-written summary of the gender gap on Wikipedia and the vignettes are much much richer than my summarization can be.

"Hacking History: Redressing Gender Inequities on Wikipedia Through an Editathon"

Reviewed by FULBERT

The authors of this study[5] focused on a 2015 editathon that occurred at the University of Edinburgh on the topic of the Edinburgh Seven, the first group of women who studied medicine at the university. They wanted to understand how this editathon, as an informal event, still constituted professional learning. Social network analysis of the 47 members of the editathon, with limited qualitative interviews of some of the participants, presented an evolving picture of how historical narratives can be constructed. Moving from consumers of information to producers of it, the researchers explored the participants' awareness of the non-neutral construction of knowledge.

The paper's literature review explored how user-generated content contains systemic and structural biases, and the implications of this for the representation of women on Wikipedia, both as subjects and as contributors, helping to understand the dominant discourse along with "continued marginalization of traditionally excluded voices and histories" (p. 4). This re-examination of the initial editathon experiences, previously published, resulted in a broadened perspective of how a group of Wikipedia editors understood their roles as representing history, one that was not neutral and one whose voice had the power to fill historical gaps in knowledge. This led to an awareness of the gendered role of male-dominated discourse on the Internet, one which could be balanced with active participation within editing Wikipedia articles from more theoretically critical perspectives. As a result, personal learning journeys could have powerful implications for what is experientially learned through participating in editathons that include materials, technology, and social relations combined through Wikipedia engagement.


"'(Weitergeleitet von Journalistin)': The Gendered Presentation of Professions on Wikipedia"

Reviewed by FULBERT

The authors explored the existence of gender bias on German Wikipedia through looking at articles about professions, exploring gender titles and images that were used.[6] They used Google hits and labor market indicators to compare this information with how men or women are actually represented within the professions. Their findings include far more representation of male titles, images, and names on Wikipedia than would be expected from labor market statistics for the corresponding professions. As the methodology was computational, the authors did not seek to explain why this strong gender imbalance exists, yet they did propose this study as a useful starting point when raising the issue and seeking ways to address it through future writing and editing efforts.

"Striking result": No bias against contributions by female editors in quality assessment

Reviewed by Tilman Bayer

There is no gender or race bias in readers' rating of the quality of Wikipedia article text (when led to believe its author is male/female/black/white). This is the result of an experimental study[7] that had Mechanical Turk workers rate the quality of simulated "gig work" (as typical for platfoms such as Upwork), with the Wikipedia texts being one of four such examples. Student essays (submissions to the SAT) formed the basis of the other three experiments.

In more detail, the researchers "sampled 100 Wikipedia articles from the Musician Biography Wiki-Project (a project focused on editing musician biographies in Wikipedia). [...] we selected four ‘Stub’ class articles and four ‘Start’ class articles as our low and high-quality deliverables (respectively). [....] To manage the workload for our participants, we also ensured that each article was between 1,000 and 10,000 bytes of body text. We also made sure these pieces of writing did not look too similar to a Wikipedia article. We did so by scraping the body of these pages and removing all links (retaining the text) and styling."

The result was then presented to raters (without mentioning Wikipedia as the source) alongside a prominent portrait of the purported author, with gender and race being conveyed through a portrait photo ("On the advice of an ethnic studies scholar, we sought to control for potential biases apart from race and gender through a standardized image selection process"), and, for the former, also via a demographically valid first name. The Wikipedia rating task was added after "none of our first three studies found that participants showed race- or gender-based rating bias. To help understand these results, we sought advice and insight from a Gender Studies scholar. She suggested that the task of evaluating writing critique [the students' essay task] might be too abstract or unnatural", leading to the choice of Wikipedia content as a rating object instead.

The authors call their results "striking" in light of "previous work showing race and gender bias". They emphasize their "statistical confidence in the overall finding", which included an "absence check" using Bayesian methods that allowed them to place an upper bound on the size of the effect of any such bias. Like other efforts to detect biases or their absence, their methodology still suffers from various limitations - e.g. the raters came only from the United States, leaving open the question whether a gender or racial bias could still exist in other countries. Also, of course, the population of Mechanical Turk workers may differ in some characteristics from that of the Wikipedia editors who review most Wikipedia contributions in real life. (However, among the various hypotheses offered by the authors as possible explanations for their results, they argue that "the familiarity of crowd workers with crowdwork incentives and work practices distinguishes them from the general population and [possibly] makes them less likely to show race or gender-based bias when doing a rating task".) Still, the fact remains that the results of a series of statistically rigorous experiments were at odds with widespread assumptions often taken for granted in discussions about Wikipedia's biases.


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer


"Unexpected forms of bias" on Wikipedia favor female over male CEOs

From the abstract:[8]

"Comparison of Wikipedia profiles of Fortune 1000 CEOs reveals that selection, source, and influence bias stemming from structural constraints on Wikipedia advantage women and disadvantage men. This finding suggests that information developed by open collaboration communities may contain unexpected forms of bias."


English Wikipedia biased against conservative and female topics, at least when compared to US magazines

From the abstract and paper:[9]

"We illustrate the method by investigating how well the English language Wikipedia addresses the content interests of four sample audiences: readers of men’s and women’s periodicals [e.g. Cosmopolitan vs. Esquire ], and readers of political periodicals geared toward either liberal or conservative ideologies [e.g. Mother Jones vs. National Review ] ... We found that 73.8% of the randomly selected 400 keywords from conservative-oriented periodicals were covered, and 81.5% of the randomly selected 400 keywords from liberal-oriented periodicals were covered. This represents a 7.7% difference in topical coverage. ... We found that 67.6% of 'women's' topics and 84.1% of 'men’s' topics were covered. This represents a 16.5% difference in the topical coverage of Wikipedia as it is represented from periodicals targeted to a specific 'gendered' readership."


"Gender and deletion on Wikipedia"

From the conclusions of this blog post:[10]

  • We know the gender breakdown [of biography articles on English Wikipedia]: skewed male, but growing slowly more balanced over time, and better for living people than historical ones.
  • We know the article lengths; slightly longer for women than men for recent articles, about equal for those created a long time ago.
  • We know that there is something different about the way male and female biographies created before ~2017 experience the deletion process, but we don’t have clear data to indicate exactly what is going on, and there are multiple potential explanations.
  • We also know that deletion activity seems to be more balanced for articles in both groups created from ~2017 onwards [...]

"Deleted gender wars"

From the blog post:[11]

"After reading the excellent analysis of AfD vs gender by Andrew Gray [see above], where he writes about the articles that faced and survived the 'Article for Deletion' process, I couldn’t help but wonder what happened to the articles that were not kept, that is, where AfD was 'successful'. ... Of the (male+female) articles, 23% are about women, which almost exactly matches Andrew’s ratio of women in BLP (Biographies of Living People). That would indicate no significant gender bias in actually deleted articles."

"Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata Human Gender Indicator"

From the abstract:[12]

"... we investigate how quantification of Wikipedia biographies can shed light on worldwide longitudinal gender inequality trends, a macro-level dimension of human development. We present the Wikidata Human Gender Indicator (WHGI), located within a set of indicators allowing comparative study of gender inequality through space and time, the Wikipedia Gender Indicators (WIGI), based on metadata available through the Wikidata database. Our research confirms that gender inequality is a phenomenon with a long history, but whose patterns can be analyzed and quantified on a larger scale than previously thought possible. Through the use of Inglehart–Welzel cultural clusters, we show that gender inequality can be analyzed with regard to world’s cultures. We also show a steadily improving trend in the coverage of women and other genders in reference works."

See also https://wigi.wmflabs.org/

Special issue of the "Nordic journal for information science and dissemination of culture"

From the introduction:[13]

"Despite-or perhaps because of-the controversies the Wikipedia gender gap offers valuable lessons for understanding the problems of archival bias, not only in Wikipedia but in crowdsourced archives more generally.This special issue of NTIK argues that archival and activist theory provides a productive theoretical framework for critiquing such bias. The issue originates from a two-day event held in Copenhagen on March 8 and 9, 2015, on the topic of gender and Wikipedia"

Article titles: "Wikipedians' Knowledge and Moral Duties", "Neutrality in the Face of Reckless Hate : Wikipedia and GamerGate", "Biases We Live By", "Wikipedia and the Myth of Universality", "From Webcams to Wikipedia: There Is An Art & Feminism Online Social Movement Happening and It Is Not Going Away", "Archival Biases and Cross-Sharing", "The Gift of Mutual Misunderstanding"

"Breastfeeding, Authority, and Genre: Women's Ethos in Wikipedia and Blogs"

From the abstract:[14]

"... the authors examine how Wikipedia’s generic regulations determine that women’s often experiential ethos is unwelcome on the site. Thus, women are often unable to construct knowledge on the 'breastfeeding' entry; their epistemological methods are ignored or banned by other contributors. This chapter also examines six breastfeeding-focused mommyblogs, proposing blogs as an alternative genre that welcomes women’s ethos. However, the authors also recognize that such blogs are not a perfect epistemological paradigm."

"Cyberfeminism on Wikipedia: Visibility and deliberation in feminist Wikiprojects"

From the English abstract (paper is in Portuguese, but about English Wikipedia):[15]

"The theoretical discussion starts from two concepts of the literature about the public sphere to analyze the case of organized groups (WikiProject Women's History and WikiProject Feminism) that seek to use the collaborative encyclopedia as (a) a platform that produces public visibility and, consequently, (b) that supports online debate about politics producing an exchange of reasoning and consolidating social representations. Based on the analysis, it is evaluated the capacity of Wikipedia to configure itself as a communicational environment that establishes new methods of construction of social representations."

"Women and Wikipedia. Diversifying Editors and Enhancing Content through Library Edit-a-Thons"

Book chapter[16]

"Similar Gaps, Different Origins? Women Readers and Editors at Greek Wikipedia"

From the abstract:[17]

"Consistent with previous studies, we found a gender gap, with women making up only 38% and 15% of readers and editors, respectively, and with men editors being much more active. Our data suggest two salient explanations: 1) women readers more often lack confidence with respect to their knowledge and technical skills as compared to men, and 2) women's behaviors may be driven by personal motivations such as enjoyment and learning, rather than by 'leaving their mark' on the community, a concern more common among men."

"Writing Women in Mathematics into Wikipedia"

From the abstract:[18]

".. I reflect upon the problems connected with writing women in mathematics into Wikipedia. I discuss some of the current projects and efforts aimed at increasing the visibility of women in mathematics on Wikipedia. I present the rules for creating a biography on Wikipedia and relate my personal experiences in creating such articles."

"How do students trust Wikipedia? An examination across genders"

From the abstract:[19]

"The results confirm that information accuracy, stability, and validity are significantly related to users’ intentions to adopt information from Wikipedia, but objectivity is not. Meanwhile, moderating role for gender on some of these effects is confirmed."

"Breaking the glass ceiling on Wikipedia"

From the paper:[20]

" ... gender is a complex issue on Wikipedia, which the realisation that articles on topics relevant to feminist and gender studies or others related to minorities rights movements may be more likely to be removed from Wikipedia (Carstensen, 2009) makes visible. To understand the background and reasons for this phenomenon, I present a fieldwork account of an incident related to a gender-related topic (Wikipedia article on ‘Glass ceiling’). The described non-fictional incident was a part of my seven-year anthropological fieldwork project on Wikipedia."

Using Wikipedia for "Analyzing Gender Stereotyping in Bollywood Movies"

From the abstract and paper:[21]

"We analyze movie plots and posters for all movies released since 1970 ... We have extracted movies pages of all the Hindi movies released from 1970- present from Wikipedia. We also employ deep image analytics to capture such bias in movie posters and previews."

Contrary to expectations, "no evidence of discrimination of female users based on their usernames"

From the abstract and paper:[22]

"As an example of [Wikipedia's gender participation] gap, posts by women on talk pages are slightly less likely to receive a reply than posts by men. [...] One of the only cues available to Wikipedia users for guessing the author of a talk page post’s gender is their username, i.e. their pseudonym on the platform. We therefore examined whether users with obviously female names receive fewer replies than users with obviously male names. [...] Contrary to our expectations, we find that users with clearly female names are slightly more likely to receive a reply than users with clearly male names. We also find that the fraction of users with a female name is much lower than the fraction of female users, suggesting that, unlike men, women using Wikipedia do not include contain [sic] obvious gender markers in their usernames. [...] This result is important for the Wikipedia community because it implies that we found no evidence of discrimination of female users based on their usernames, unlike what other studies have found in offline and online correspondences in male-dominated fields."

Some informative non-research overview publications

  • Netha Hussain: "Research on gender gap in Wikipedia. What has been done so far?" (2017) / Netha Hussain, Reem Al-Kashif: "Research on gender gap in Wikipedia: What do we know so far?" Presentation at Wikimania 2018
  • Krämer, Katrina (July 2019). "Female scientists' pages keep disappearing from Wikipedia – what's going on?". Chemistry World.

References

  1. ^ Adams, Julia; Brückner, Hannah; Naslund, Cambria (2019-01-01). "Who Counts as a Notable Sociologist on Wikipedia? Gender, Race, and the "Professor Test"". Socius. 5: 2378023118823946. doi:10.1177/2378023118823946. ISSN 2378-0231. S2CID 149857577.
  2. ^ Anwesha Chakraborty and Netha Hussain: Mapping and Bridging the Gender Gap: An Ethnographic Study of Indian Wikipedians and Their Motivations to Contribute (extended abstract). http://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_5.pdf
  3. ^ Yazdani, M. (2017). Investigating the Gender Pronoun Gap in Wikipedia. WikiStudies, 1(1), 96-116.
  4. ^ Menking, Amanda; Erickson, Ingrid; Pratt, Wanda (2019). "People Who Can Take It: How Women Wikipedians Negotiate and Navigate Safety". Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM. pp. 472:1–472:14. doi:10.1145/3290605.3300702. ISBN 9781450359702. S2CID 140247548.
  5. ^ Hood, Nina; Littlejohn, Allison (2018-11-27). "Hacking History: Redressing Gender Inequities on Wikipedia Through an Editathon". The International Review of Research in Open and Distributed Learning. 19 (5). doi:10.19173/irrodl.v19i5.3549. ISSN 1492-3831.
  6. ^ Zagovora, Olga; Flöck, Fabian; Wagner, Claudia (2017-06-12). ""(Weitergeleitet von Journalistin)": The Gendered Presentation of Professions on Wikipedia". Proceedings of the 2017 ACM on Web Science Conference. ACM. pp. 83–92. arXiv:1706.03848v1. Bibcode:2017arXiv170603848Z. doi:10.1145/3091478.3091488. ISBN 978-1-4503-4896-6. S2CID 11059274.
  7. ^ Thebault-Spieker, Jacob; Kluver, Daniel; Klein, Maximilian A.; Halfaker, Aaron; Hecht, Brent; Terveen, Loren; Konstan, Joseph A. (December 2017). "Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems" (PDF). Proc. ACM Hum.-Comput. Interact. 1 (CSCW): 101:1–25. doi:10.1145/3134736. ISSN 2573-0142. S2CID 12628445.
  8. ^ Young, Amber; Wigdor, Ari; Kane, Gerald (2016-12-11). "It's Not What You Think: Gender Bias in Information about Fortune 1000 CEOs on Wikipedia". ICIS 2016 Proceedings.
  9. ^ Menking, Amanda; McDonald, David W.; Zachry, Mark (2017). Who Wants to Read This?: A Method for Measuring Topical Representativeness in User Generated Content Systems (PDF). CSCW '17. New York, NY, USA: ACM. pp. 2068–2081. doi:10.1145/2998181.2998254. ISBN 9781450343350.
  10. ^ Gray, Andrew (2019-05-06). "Gender and deletion on Wikipedia". generalist.co.uk. (blog post)
  11. ^ Manske, Magnus (2019-05-08). "Deleted gender wars". The Whelming. (blog post)
  12. ^ Konieczny, Piotr; Klein, Maximilian (2018-12-01). "Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata Human Gender Indicator". New Media & Society. 20 (12): 4608–4633. arXiv:1502.03086. doi:10.1177/1461444818779080. ISSN 1461-4448. S2CID 58008216. Closed access icon
  13. ^ Ford, Heather; Mai, Jens-Erik; Salor, Erinc; Søgaard, Anders; Adler, Melissa; Washko, Angela; Ping-Huang, Marianne; Ørum, Kristoffer (2016). "NTIK, Tema: Køn & Crowdsourcing". Nordisk Tidsskrift for Informationsvidenskab og Kulturformidling. 5 (1). ISSN 2245-294X.
  14. ^ Alison M. Lukowski, Erika M. Sparby: Breastfeeding, Authority, and Genre: Women's Ethos in Wikipedia and Blogs. In: Moe, Folk; Shawn, Apostel (2016-11-09). Establishing and Evaluating Digital Ethos and Online Credibility. IGI Global. ISBN 9781522510734., p. 329 ff. Closed access icon
  15. ^ Matos, Eurico Oliveira; Acker, Isabel de Souza. "Cyberfeminism on Wikipedia: Visibility and deliberation in feminist Wikiprojects". Cuestiones de Género: De la Igualdad y la Diferencia: 20. Nº. 12, 2017 – e-ISSN: 2444-0221 - pp. 365-384
  16. ^ Therese F. Triump, Kimberly M. Henze: "Women and Wikipedia. Diversifying Editors and Enhancing Content through Library Edit-a-Thons". In: Gender Issues and the Library: Case Studies of Innovative Programs and Resources, ed. Lura Sanborn, McFarland, 2017, ISBN 9781476630342, p. 155 ff. Closed access icon
  17. ^ Protonotarios, Ioannis; Sarimpei, Vasiliki; Otterbacher, Jahna (2016-04-16). Similar Gaps, Different Origins? Women Readers and Editors at Greek Wikipedia. Tenth International AAAI Conference on Web and Social Media.
  18. ^ Vitulli, Marie A. (2017-10-30). "Writing Women in Mathematics into Wikipedia". Notices of the AMS. 65 (3): 330–334. arXiv:1710.11103. Bibcode:2017arXiv171011103V. doi:10.1090/noti1650. S2CID 119259241.
  19. ^ Jun Huang; Si Shi; Yang Chen; Wing S. Chow (2016-09-27). "How do students trust Wikipedia? An examination across genders". Information Technology & People. 29 (4): 750–773. doi:10.1108/ITP-12-2014-0267. ISSN 0959-3845. Closed access icon
  20. ^ Jemielniak, Dariusz (July 2016). "breaking the glass ceiling on Wikipedia". Feminist Review. 113 (1): 103–108. doi:10.1057/fr.2016.9. ISSN 0141-7789. S2CID 73656903. Closed access icon Author's copy
  21. ^ Madaan, Nishtha; Mehta, Sameep; Agrawaal, Taneea S.; Malhotra, Vrinda; Aggarwal, Aditi; Saxena, Mayank (2017-10-11). "Analyzing Gender Stereotyping in Bollywood Movies". arXiv:1710.04117 [cs.SI].
  22. ^ Ross, Björn; Dado, Marielle; Heisel, Maritta; Cabrera, Benjamin (2018). "Gender Markers in Wikipedia Usernames" (PDF).


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • (later note – my initial comments here overlook table 2 in the Adams et al. article, see later comments) The first article reviewed has this sentence: "Examining only sociologists with Wikipedia pages, men’s median H-index (27) is higher than women’s (22)".[1] This straightforward method doesn't give the result they were expecting – and indeed it might even suggest that more articles on male sociologists are what's needed if gender-blind notability fairness is the ideal. But fear not, once the authors use a "logged version of the H-index to adjust for a strong right skew in the H-index distribution" (I don't get it, but okay) and throw a bunch of other factors into a regression analysis, they get the result that women are being cheated out of articles after all. I don't know. Seems like a lot of degrees of freedom here and discarding a simple test with what's probably the most objective merit-based measure available (the h-index) in favor of an opaque regression analysis isn't necessarily convincing. Haukur (talk) 01:09, 31 August 2019 (UTC)[reply]
    I don't get it, but okay – If you don't understand the reasoning of the authors, how can you be so confident that they are wrong? — Bilorv (talk) 08:32, 31 August 2019 (UTC)[reply]
    The important sentence comes after that one: "Women’s estimated odds of having a Wikipedia page after taking into account differences in rank, length of career, and notability measured with H-index and departmental reputation are still 25 points lower than men’s." But "length of career" and "departmental reputation" are awful ways to estimate whether someone merits a Wikipedia article. The best measure (though it is of course, like everything, imperfect) is the h-index. But instead of sticking to that, the authors decided to go with a complicated composite measurement instead – one that I'd say is a lot less accurate. In the end this is not a persuasive analysis. Haukur (talk) 09:29, 31 August 2019 (UTC)[reply]
    Also note how our current article says "Female and nonwhite US sociologists less likely to have Wikipedia articles than scholars of similar citation impact" which is not at all the conclusion reached in that research paper. We'd better correct this. Haukur (talk) 09:39, 31 August 2019 (UTC)[reply]
    Having now read the Adams et al. article more carefully I must note that my comments above make too much of the sentence "Examining only sociologists with Wikipedia pages, men’s median H-index (27) is higher than women’s (22)" and don't take their Table 2 into account. I think there's a Simpson's paradox in the data. I'm going to think more about this and maybe see if I can get the data from the authors. Haukur (talk) 16:26, 31 August 2019 (UTC)[reply]
    Any data about h-index is highly skewed by the difference in fields and other factors. Even within sociology, there may be vast differences: an h-index of 27 may be exceptional in some areas and ordinary elsewhere. Nemo 07:21, 1 September 2019 (UTC)[reply]
    Good point. Conceivably, women might tend to work in subfields with lower citation averages. I've been thinking of possible explanations for the results here and there are so many possibilities. I've requested the data mentioned in footnotes 20-22 from the corresponding author. Haukur (talk) 08:36, 1 September 2019 (UTC)[reply]
It seems inaccurate to say there's a "composite measurement" being used in the regression or to suggest that including multiple measures in regression analysis is less valid than just considering one of the measures (H-index) alone. H-index is very much part of the regression analysis and neither including the other measures in the model nor transforming the H-index values (by calculating the natural logarithm of each value) undermines it in any way. Indeed, the regression results in Table 3 indicate that H-index is (as you seem to expect!) the measure most closely related with being the subject of an article (the odds-ratio is quite large and statistically significant). The analysis supports the idea that U.S. Sociologists with higher H-indices are more likely to be the subject of EN:WP articles. It also suggests that female and nonwhite U.S. Sociologists are less likely to be the subject of EN:WP articles. These interpretations are mutually compatible. Aaron (talk) 16:33, 3 September 2019 (UTC)[reply]
I regret making fun of the log operation, which really is fine. And regression analysis is fine too - though I stand by my criticism of "length of career" being a suitable indicator of notability. I'm developing a more nuanced take on this and the corresponding author has kindly promised to send me some data. Haukur (talk) 16:43, 3 September 2019 (UTC)[reply]
Thanks all for the great discussion! Haukur: Please do let us know once you have received the data and would like to comment more on it. In the meantime, I have reverted the "correction" because 1) citation impact is actually a general umbrella term that includes the h-index and has the advantage of being easier to understand (also, it was still being used in the body of the review after your edit, so changing it only in the title seemed a bit pointless anyway), 2) "similar careers" seems overstating the results quite a bit, because career encompasses a persons entire professional trajectory whereas the measures used here only pertain to a specific moment in that career (plus its length) 3) it seems from the above discussion that the concerns about the interpretation of the result regarding the h-index have been resolved. (I do agree it's an interesting observation that the differences in the median go the other way, if I understood that correctly, but as Aaron points out, this might not undermine the overall result. In any case, log transforms are frequently used and even recommended by some as standard practice for data that only takes on positive values.)
I think a more interesting question might be whether WP:GNG or WP:AUTHOR could be a confounding factor here - there may well be many sociologists on the authors' list whose Wikipedia notability did not rest on WP:PROF but on media coverage or their authorship of (popular or academic) books. Regards, HaeB (talk) 19:00, 21 September 2019 (UTC)[reply]
I thought it would be best to reflect the wording of the paper, which never uses the word "citation impact", though I take your point that it can be used to refer to h-index. The paper says "academic rank, length of career, and notability measured with both H-index and departmental reputation" which I think is reasonably summarized as "career" and I don't see what is gained by a switch to "seniority, institutional status, publication count, and H-index".
But whatever, all of this is a side issue since you're quite right that there must be other factors. Indeed, the very data in the paper shows that if you went by H-index alone and used that 100% fairly to pick out sociologists to write articles about then (assuming the same number of articles) you would get a higher ratio of white men then Wikipedia actually had. So the idea that Wikipedia has a bias against writing articles on female sociologists is really not, in my view, supported by this dataset. Haukur (talk) 20:08, 21 September 2019 (UTC)[reply]
Related discussion: Wikipedia_talk:Notability_(academics)#Recent_study:_"Who_Counts_as_a_Notable_Sociologist_on_Wikipedia?". Regards, HaeB (talk) 19:00, 21 September 2019 (UTC)[reply]
  • Gosh, this is a boring subject in the context of Wikipedia because much of it actually has little to do with Wikipedia per se, despite the (often extremely small) samples using WP as a means to "prove" their hypotheses. This is particularly evident in the Indian survey mentioned, which notes that the cultural issues mostly lie outside WP's ambit. One of the problems with being a major site on the internet is that WP becomes a mechanism for pursuit of agendas, regardless of its actual relevance to the overall issue. To paraphrase a BBC saying, "Other websites are available". And as I keep saying, it isn't our job to change the world but rather to reflect it in all its contrasting beauty and ugliness. - Sitush (talk) 03:22, 31 August 2019 (UTC)[reply]
    True for some of the studies, but the first study might make us question our views of NPROF, the third can lead us to embracing a more careful writing style with regard to how we talk about men and women and the fourth is a damning indictment of the toxicity of our behaviour around contentious topics. And there are plenty more interesting conclusions which are relevant to an editor's regular editing patterns. — Bilorv (talk) 08:32, 31 August 2019 (UTC)[reply]
    I'll quote from comments by Terri Apter, psychologist and Fellow Emerita of Newnham College, in her notes on The Human Stain by Philip Roth: "[The book is] also about the dangerous pleasures of outrage, what Roth called 'the ectasy of sanctimony'. Of course, we have to be aware of hate speech and embedded bias, they are problematic. But using your need to feel virtuous to tear others apart is also problematic". Too many of these studies start from a virtuous/sanctimonious premise. Good studies draw conclusions after the study, not before undertaking it. - Sitush (talk) 19:21, 12 September 2019 (UTC)[reply]
  • Can someone with access to the 4th one (Safety and women editors on Wikipedia) give some more detail on what they're suggesting with "internal safe spaces". As in, parts of the encyclopedia with very high civility requirements? Nosebagbear (talk) 15:32, 2 September 2019 (UTC)[reply]
    @Nosebagbear: The file may be located over here. Regards, WBGconverse 11:19, 3 September 2019 (UTC)[reply]
    Thanks for the above WBG. So in terms of current safe spaces it refers to off-wiki online (fb, mainly) and offline (women-only edit-a-thons) areas. They moot the creation of an on-wiki women-only space, though they don't consider the fairly substantial issues with that (verification, reporting of misbehaviour, canvassing risks, as well as any disagreements on the fundamental nature of wikipedia) Nosebagbear (talk) 11:29, 3 September 2019 (UTC)[reply]
  • Sitush I reviewed the breastfeeding study and it really surprised me to find such a poorly done piece. My take on it was so similar to your thoughts that you might like to read User:WhatamIdoing's page where we discuss it. Gandydancer (talk) 15:21, 12 September 2019 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2019-08-30/Recent_research