A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
The Institute for Strategic Dialogue, a London-based think tank, earlier this month published a report (co-authored with a company called CASM Technology)[1] focusing "on information warfare on Wikipedia about the invasion of Ukraine"; see also this issue's "In the media" (summarizing media coverage of the report) and "Disinformation report" (providing context in form of various other concrete cases).
As summarized in the abstract:
"The report combines a literature review on publicly available research and information around Wikipedia, expert interviews and a case study.
For the case study, the English-language Wikipedia page for the Russo-Ukrainian war was chosen, where accounts that edited the page and have subsequently been blocked from editing were examined. Their editing behaviour on other Wikipedia pages was mapped to understand the scale and overlap of contributions. This network mapping has seemed to identify a particular strategy used by bad actors of dividing edits on similar pages across a number of accounts in order to evade detection. Researchers then tested an approach of filtering edits by blocked editors based on whether they add references to state-media affiliated or sponsored sites, and found that a number of edits exhibited narratives consistent with Kremlin-sponsored information warfare. Based on this, researchers were able to identify a number of other Wikipedia pages where blocked editors introduced state-affiliated domains [...]"
The report offers a great overview of Wikipedia's existing mechanisms for dealing with such issues, based on numerous conversations with community members and other experts. However, the literature review indicates that the authors – despite confidently telling Wired magazine "We've never tried to analyze Wikipedia data in that way before" – were unfamiliar with a lot of existing academic research (e.g. about finding alternative accounts, aka sockpuppets, of abusive editors); the 39 references cited in the report include only a single peer-reviewed research paper. Likewise, despite the hope that their findings could yield "new tools" (Wired) that would support combating disinformation on Wikipedia, there is no indication that the authors were aware of past and ongoing research-supported product development efforts to build such tools, by the Wikimedia Foundation and others, some of which are outlined below. On Twitter, the lead author stated that "We're going to be doing more research on information warfare on Wikipedia with a new project kicking off later this month [October]", so perhaps some of these gaps can still be bridged.
Exactly two years ago, in the run-up to the 2020 US elections, the Wikimedia Foundation published a blog post noting concerns about a "rising rate and sophistication of disinformation campaigns" on the internet by coordinated actors, about elections and other topics such as the global pandemic or climate change, and providing a summary of how Wikipedia specifically was addressing such threats.
After mentioning the volunteer community's "robust mechanisms and editorial guidelines that have made the site one of the most trusted sources of information online" and announcing an internal anti-disinformation task force at the Foundation (which reportedly still exists, although one former member recently stated they were unaware what its current work areas are) as well as "strengthened capacity building by creating several new positions, including anti-disinformation director and research scientist roles," the post focused on summarizing how
"the Foundation's research team, in collaboration with multiple universities around the world, delivered a suite of new research projects that examined how disinformation could manifest on the site. The insights from the research led to the product development of new human-centered machine learning services that enhance the community's oversight of the projects.
These algorithms support editors in tasks such as detecting unsourced statements on Wikipedia and identify malicious edits and behavior trends.
With the US mid-term elections imminent and independent researchers apparently being unaware of these research projects at the Foundation (see above), now seems a good time to take a look at how they have developed in the meantime. As "some of the tools used or soon available to be used by editors", the October 2020 post listed the following:
- An algorithm that identifies unsourced statements or edits that require citation. The algorithm surfaces unverified statements; it helps editors decide if the sentence needs a citation, and, in return, human editors improve the algorithm’s deep learning ability.
- Algorithms to help community experts to identify accounts that may be linked to suspected sockpuppet accounts.
- A machine learning system to detect inconsistencies across Wikipedia and Wikidata, helping editors to spot contradictory content across different Wikimedia projects.
- A daily report of articles that have recently received a high volume of traffic from social media platforms. The report helps editors detect trends that may lead to spikes of vandalism on Wikipedia helping them identify and respond faster.
Furthermore, the 2020 post mentioned the (at that time already widely used) ORES system.
The efforts appear to be part of the WMF Research team's "knowledge integrity" focus, announced in February 2019 in one of four "white papers that outline our plans and priorities for the next 5 years" .
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
From the abstract:[2]
"We develop a neural network based system, called Side [demo available at https://verifier.sideeditor.com/ ], to identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowd-sourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system's suggested alternatives compared to the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that Side's first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims according to Side. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia."
See also research project page on Meta-wiki
From the abstract:[3]
"Given a one sentence claim, the challenge is to automatically find a knowledge source (e.g. a book, a research article, a web page) that could support or refute the claim. We show that this capability could be learnt by observing associations between sentences in English Wikipedia and citations provided for them. Thus, we collect a corpus of over 50 million references to 24 million identified sources with the citation context from Wikipedia, and build search indices using several meaning representation methods."
From the publisher's description::[4]
"This book provides a concise yet comprehensive guide to Wikipedia for researchers and students of linguistics, discourse and communication studies, redressing the gap in research on Wikipedia in these fields and encouraging scholars to explore Wikipedia further as a platform and a medium. Drawing on [Susan] Herring's situational and medium factors [in computer-mediated communication], as well as related developments in (critical) discourse studies, the author studies the online encyclopaedia both theoretically and empirically, examining its origins, production and consumption before turning to a discussion of its societal significance and function(s)."
From the abstract:[5]
"The referential texts in the Russian Wikipedia and the Great Russian Encyclopedia [...] were selected as examples for the analysis. A comparative analysis of articles on music and the composers who lived and worked in the USSR (including Sergei Prokofiev, Dmitri Shostakovich, Dmitri Kabalevsky, Tikhon Khrennikov, Boris Asafiev, Isaak Dunaevsky, Georgy Sviridov, Aram Khachaturian, Sofia Gubaidulina and Alfred Schnittke) displayed a number of regularities: emphasizing previously unknown areas of music of that period ("avant-garde music", "repressed music"), replacement or disregard towards the epithet "Soviet" regarding musical phenomena and composers, and the absence of any nostalgia for Soviet musical culture in modern receptions."
Discuss this story