The Signpost

Recent research

How censorship can backfire and conversations can go awry

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach"

Reviewed by Maik Stührenberg

This paper[1] is thoroughly structured and combines the theory of web genres with dialogue theory to examine Wikipedia talk pages. Since Wikipedia is a web genre, "Wikicussions" (as the authors call them) form a subgenre. In this context, talk pages are examined further, including the quality of cooperation between Wikipedia users, that can be linked to social differentiation regarding roles and statuses of Wikipedians (content- vs. administration-related users). These group-related processes can be seen as a mediating layer between external parameters (system requirements for Wikipedia's user community) and the structure and dynamics of WP's subgenres.

Unlike face-to-face dialogue, the authors argue that Wikicussions stand out due to a publicly available common ground (derived from dialogue theory), which may provide a reason for the structures they found.

The paper is enriched with a number of high-quality figures that support and underpin the findings.

Graph between November 2000 and November 2015 clearly demonstrating that most posts come from registered users
Frequency distribution of talk posts over time within the German Wikipedia (blue: registered users; red: anonymous users; green: bots; black: all users). Unsigned posts (without timestamps) are excluded. Posts dated by posters outside of the valid time-frame (before the date of creation of the discussion or after the date of its download) are also excluded. (Figure 7 from the paper)

"How Sudden Censorship Can Increase Access to Information"

Reviewed by Bri and Tilman Bayer

Our intuition might tell us that government censorship causes reduced access to online information. But recent research indicates that the effect can be exactly the opposite. Using data gathered from Wikipedia page views and other sources, researchers William Hobbs and Margaret Roberts found that:

Specifically, the authors studied the impact of a block of Instagram in China on September 29, 2014, following protests in Hong Kong, on Chinese Wikipedia pages that were already blocked in the country. (This predates the 2015 total block of the Chinese Wikipedia and the switch of all Wikimedia sites to full encryption with HTTPS around the same time, which made such per-page blocking impossible.) The censored Chinese Wikipedia pages with the largest increase in views "shows that new viewers accessed pages that had long been censored including those related to the 1989 Tiananmen Square protests",[2]: 12  i.e. "viewing patterns that would be more typical of new users who had just jumped the firewall, rather than of old VPN users who had presumably consumed this information long ago."[2]: 11  Here is an excerpt of the full list examined in the research, the top 10 for the second day of the block, linked here to their English Wikipedia equivalents:

  1. People's Republic of China blocked websites list
  2. Jiang Zemin
  3. Radio Australia
  4. Hu Jintao
  5. Zeng Qing
  6. Wang Weilin (Tank Man)
  7. Li Peng
  8. Tiananmen Square Incident
  9. Zhou Yongkang
  10. Wu'erkaixi (June 4 leader)

The researchers propose to name this phenomenon the "gateway effect", a "mechanism through which repression can backfire inadvertently, without political or strategic motivation",[2]: 3  because it incentivizes people to learn how to evade censorship and thus "have more, not less, access to information and begin engaging in conversations, social media sites, and networks that have long been off-limits to them."[2]: 15  They distinguish it from the Streisand effect, where individuals specifically seek out information that is being hidden.

The second author of the study, Margaret Roberts, is also the author of Censored: Distraction and Diversion Inside China's Great Firewall (Princeton University Press, 2018; print ISBN 978-0-691-17886-8, e-book 978-1-400-89005-7).

Marketing, social media, and Wikipedia

Reviewed by Barbara Page

This study was able to "characterize" the interests of Wikipedia editors and the editors' social media activity on Twitter to facilitate:

Photograph of person's left hand holding a smartphone that is accessing social media
A marriage between editor editing topics and Twitter (and possibly Facebook) will result in targeted marketing tailored just for you!

Conferences and events

See the community-curated research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.

WMF research showcase

Recent presentations at the monthly Research showcase hosted by the Wikimedia Foundation included the following:

"Conversations Gone Awry: Detecting Early Signs of Conversational Failure"
PDF of "Conversations Gone Awry" with first page depicted
Presentation slides (video)

Antisocial behavior can exist in online social systems and may include harassment and personal attacks. A new paper[4] by seven researchers from Cornell University, Jigsaw, and the Wikimedia Foundation describes how the prediction of undesirable negative exchanges may be able to prevent the deterioration of a discussion. Prediction may be possible at the start of a conversation to prevent its deterioration. One of the authors also gave an interview published on the Wikimedia Foundation's blog,[supp 1] and the paper was covered in popular media; see In the media § In brief.

Case studies in the appropriation of ORES

From the announcement (by Aaron Halfaker):

PDF of "ORES appropriation and reflection" with first page depicted
Presentation slides about the use of the ORES platform (video)

The presentation covered "three key tools that Wikipedians have developed that make use of ORES": Wikidata's damage detection models, exposed through Recent Changes; Spanish Wikipedia's PatruBOT; and WikiEdu tools from User:Ragesoss that incorporate article quality models.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer
  • "On the Effects of Authority on Peer Motivation: Learning from Wikipedia"[5] – From the abstract: "We show that lateral authority, the legitimacy to resolve task‐specific problems, is welcomed by members of an organization in the resolution of coordination conflicts, the more so (1) the fiercer the conflict to be resolved, (2) the higher the competence‐based status of the authority, (3) the lower the tenure of, and (4) the more focused the organizational members are. Analyzing the discussion behavior of members of Wikipedia between 2002 and 2014, we corroborate our allegations empirically by analyzing 642,916 article–discussion pages."
  • "A Comparison of the Historical Entries in Wikipedia and Baidu Baike"[6] – From the abstract: "This research purposefully chose 6 entries and developed a framework to evaluate their performance in accuracy, breadth, depth, informativeness, conciseness and objectiveness. The result shows that: Wikipedia is superior in most cases while Baidu Baike is a little better in the entries on Chinese history. The operating mechanism is the main reason for it."
  • "Sentiments in Wikipedia Articles for Deletion Discussions"[7] – From the abstract: "We performed sentiment analysis on 37,761 AfD discussions with 156,415 top-level comments and explored relationship between outcomes of the discussion and sentiments in the comments. Our preliminary work suggests: discussion that have keep or other outcomes have more than expected positive sentiment, whereas discussions that have delete outcomes have more than expected negative and neutral sentiment. This result shows that there tends to be positive sentiment in the comment when Wikipedia users suggest not to delete the article."
  • "'What are these researchers doing in my Wikipedia?': ethical premises and practical judgment in internet-based ethnography"[8] – From the abstract: "The article reflects on the heuristics that guided the decisions of a 4-year participant observation in the English-language and German-language editions of Wikipedia. [...] it interrogates the technological, social, and legal implications of publicness and information sensitivity as core ethical concerns among Wikipedia authors. The first problem area of managing accessibility and anonymity contrasts the handling of the technologically available records of activities, disclosures of personal information, and the legal obligations to credit authorship with the authors' right to work anonymously and the need to shield their identity. The second area confronts the contingent addressability of editors with the demand to assure and maintain informed consent." (See also the Wikipedia essay "What are these researchers doing in my Wikipedia?")
  • "Digging Wikipedia: The Online Encyclopedia As a Digital Cultural Heritage Gateway and Site"[9] – From the abstract: "[...] this article introduces Wikipedia as a digital gateway to and site of an active engagement with cultural heritage. We have developed the open source and freely available analysis architecture Contropedia [website] to examine already existing volunteer user-generated participation around cultural heritage and to promote further engagement with it. Conceptually, we employ the notion of memory work, as it helps to treat Wikipedia's articles, edit histories, and discussion pages as a rich resource to study how cultural heritage is received and (re)worked in and across languages and cultures. [...] The analysis facilitated by Contropedia [...] sheds light on the contentious articulation of perspectives on tangible and intangible heritage grounded by conflicting conceptions of events, ideas, places, or persons. Technologically, Contropedia combines techniques based on mining article edit histories and analyzing discussion patterns in talk pages to identify and visualize heritage-related disputes within an article, and to compare these across language versions." (cf. earlier coverage: "'Contropedia' tool identifies controversial issues within articles"; "Towards better visual tools for exploring Wikipedia article development – the use case of 'Gamergate controversy'")
  • "Use of Louisiana's Digital Cultural Heritage by Wikipedians"[10] – From the abstract: "This case study details an analysis of Wikipedia links to online resources from Louisiana cultural heritage institutions [also known among Wikimedians as GLAMs] in order to determine what types of cultural heritage resources users are citing on Wikipedia, what is the content of the Wikipedia articles with Louisiana CHI citations, and how this can influence the work of CHI. The results of the study include findings that digital library items and archival finding aids are the most cited sources from cultural heritage institutions on Wikipedia and are particularly popular for Louisiana-specific Wikipedia articles on society and the social sciences and culture and the arts."
  • "The Conceptual Correspondence between the Encyclopaedia and Wikipedia"[11] – From the abstract: "This study [...] focuses on the roles and attributes of both printed encyclopaedias and Wikipedia. First, we analyse the roles and attributes of an encyclopaedia by conducting a review of research related to them. Then we analyse whether or not Wikipedia fulfills the same roles and has the same attributes as the encyclopaedia by reviewing academic work that investigates and analyses Wikipedia from various perspectives. The results show that Wikipedia does not conceptually correspond to an encyclopaedia, except in cases where people use it for one-time searches. In the world of digital media, Wikipedia does not have the same status that the encyclopaedia holds in the world of print media."
  • "Structural Differentiation in Social Media: Adhocracy, Entropy, and the '1 % Effect'"[12] – From the text: "Over the study period (2001–2010), we observed 235,701,162 edits completed by 22,792,847 unique contributors. Of these, 19,680,637 users were anonymous, identified only by their unique IP addresses. The rest (3,112,210) were registered users who were logged into their respective accounts. [...] logged-in users were the clear minority group, yet they contributed far more edits than the anonymous users—all told, those logged-in individuals were responsible for almost two-thirds (68%) of the observed revisions. Even more importantly, the top 1% of all contributors were responsible for 77% of the collaborative effort based upon the extent to which the text of articles was actually changed (i.e., the contribution delta). [... The] simple answer to research question 2 (RQ2), 'What is the social mobility (or its inverse, elite "stickiness") of functional leaders on Wikipedia over time?' is that on average, across the entire 9.5-year period, an individual who was a top contributor at a given point in time had a 40% probability of remaining in the top contributor group 5 weeks later. Twenty weeks later, that individual would have a 32% chance of still being a top contributor, and after 30 weeks, this figure would be at 28%."
    In a press release by Purdue University, one of the authors commented: "What we saw is that a clear leadership has emerged, but it's a leadership that cycles. We have a group of individuals who shape the content by working the hardest and clocking the most hours. The agenda is shaped by these people, and they're driven by a sense of mission, much like political or religious movements."[supp 2]

References

  1. ^ Mehler, Alexander; Gleim, Rüdiger; Lücking, Andy; Uslu, Tolga; Stegbauer, Christian (January 30, 2018). "On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach" (PDF). Glottometrics. 40. RAM-Verlag (published January 2018): 1–45. ISSN 1617-8351. OCLC 7493144471. Archived (PDF) from the original on June 28, 2018. Retrieved June 28, 2018 – via ResearchGate. {{cite journal}}: External link in |volume= (help) Open access icon
  2. ^ a b c d e Hobbs, William R.; Roberts, Margaret E. (April 2, 2018). "How Sudden Censorship Can Increase Access to Information". American Political Science Review. Cambridge University Press: 1–16. doi:10.1017/S0003055418000084. eISSN 1537-5943. ISSN 0003-0554. OCLC 7435466814. Closed access icon
  3. ^ Torrero, Christian; Caprini, Carlo; Miorandi, Daniele (April 9, 2018). "A Wikipedia-based approach to profiling activities on social media". p. 1. arXiv:1804.02245v2 [cs.IR].
  4. ^ Zhang, Justine; Chang, Jonathan P.; Danescu-Niculescu-Mizil, Cristian; Dixon, Lucas; Yiqing, Hua; Thain, Nithum; Taraborelli, Dario (May 14, 2018). "Conversations Gone Awry: Detecting Early Signs of Conversational Failure". arXiv:1805.05345v1 [cs.CL].
  5. ^ Klapper, Helge; Reitzig, Markus (May 7, 2018). "On the Effects of Authority on Peer Motivation: Learning from Wikipedia" (PDF). Strategic Management Journal. John Wiley & Sons. doi:10.1002/smj.2909. eISSN 1097-0266. OCLC 7586436764. Retrieved June 28, 2018. Open access icon
  6. ^ Shang, Wenyi (March 15, 2018). "A Comparison of the Historical Entries in Wikipedia and Baidu Baike". In Chowdhury, Gobinda; McLeod, Julie; Gillet, Val; Willett, Peter (eds.). Transforming Digital Worlds. International Conference on Information (iConference 2018; March 25–28 at Sheffield, United Kingdom). Lecture Notes in Computer Science. Vol. 10766 (Online ed.). Cham, Switzerland: Springer International Publishing AG. pp. 74–80. doi:10.1007/978-3-319-78105-1_9. ISBN 978-3-319-78105-1. OCLC 7357407865. Closed access icon
  7. ^ Xiao, Lu; Sitaula, Niraj (March 15, 2018). "Sentiments in Wikipedia Articles for Deletion Discussions". In Chowdhury, Gobinda; McLeod, Julie; Gillet, Val; Willett, Peter (eds.). Transforming Digital Worlds. International Conference on Information (iConference 2018; March 25–28 at Sheffield, United Kingdom). Lecture Notes in Computer Science. Vol. 10766 (Online ed.). Cham, Switzerland: Springer International Publishing AG. pp. 81–86. doi:10.1007/978-3-319-78105-1_10. ISBN 978-3-319-78105-1. OCLC 7357407963. Closed access icon
  8. ^ Pentzold, Christian (May 3, 2017). "'What are these researchers doing in my Wikipedia?': ethical premises and practical judgment in internet-based ethnography" (PDF). Ethics and Information Technology. 19 (2). Springer Science+Business Media (published May 5, 2017): 143–155. doi:10.1007/s10676-017-9423-7. eISSN 1572-8439. ISSN 1388-1957. OCLC 7039749181. Archived (PDF) from the original on June 28, 2018. Retrieved June 28, 2018 – via ChristianPentzold.de. Free access icon
  9. ^ Pentzold, Christian; Weltevrede, Esther; Mauri, Michele; Laniado, David; Kaltenbrunner, Andreas; Borra, Erik (March 13, 2017). Scopigno, Roberto (ed.). "Digging Wikipedia: The Online Encyclopedia as a Digital Cultural Heritage Gateway and Site" (PDF). Journal on Computing and Cultural Heritage. Special Issue on Digital Infrastructure for Cultural Heritage, Part 1. 10 (1). New York: Association for Computing Machinery (published April 14, 2017): 5:1–5:19. doi:10.1145/3012285. eISSN 1556-4711. ISSN 1556-4673. OCLC 7006965721. Retrieved June 28, 2018 – via ResearchGate. Free access icon
  10. ^ Kelly, Elizabeth Joan (November 28, 2017). "Use of Louisiana's Digital Cultural Heritage by Wikipedians". Practical Communication. Journal of Web Librarianship. 12 (2). Taylor & Francis: 85–106. doi:10.1080/19322909.2017.1391733. eISSN 1932-2917. ISSN 1932-2909. OCLC 7566358637. Closed access icon
  11. ^ Yamada, Shohei (December 29, 2017). "The Conceptual Correspondence between the Encyclopaedia and Wikipedia". Journal of Japan Society of Library and Information Science. 63 (4). Japan Society of Library and Information Science: 181–195. doi:10.20651/jslis.63.4_181. eISSN 2432-4027. ISSN 1344-8668. OCLC 7261862873. Closed access icon
  12. ^ Matei, Sorin Adam; Britt, Brian C. (September 21, 2017). "Analytic Investigation of a Structural Differentiation Model for Social Media Production Groups". In Alhajj, Reda; Glässer, Uwe (eds.). Structural Differentiation in Social Media: Adhocracy, Entropy, and the '1 % Effect'. Lecture Notes in Social Networks (1st ed.). Cham, Switzerland: Springer Nature. pp. 73, 75. doi:10.1007/978-3-319-64425-7_5. eISSN 2190-5436. ISBN 978-3-319-64424-0. ISSN 2190-5436. LCCN 2017948031. OCLC 7138124671.
Supplementary references:
  1. ^ Zhang, Justine; Chang, Jonathan (June 13, 2018). "'Conversations gone awry'—the researchers figuring out when online conversations get out of hand". Wikimedia Blog (Interview). Interviewed by Melody Kramer; Dario Taraborelli. Wikimedia Foundation. Archived from the original on June 28, 2018. Retrieved June 28, 2018.
  2. ^ Bush, Jim (November 6, 2017). "Results of Wikipedia study may surprise". Purdue News Service and Agricultural Communications (Press release). West Lafayette, Indiana: Purdue University. OCLC 7177119166. Archived from the original on June 28, 2018. Retrieved June 28, 2018.
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • I don't have access to the AfD sentiment analysis paper, but I'd be curious how robust those findings are. If they're strong enough, we could theoretically use such analysis to attempt to detect potentially improper closes. I don't think that's a good idea (at least for at the individual level) so perhaps it's something to be aware of. I wonder what other discussions have a large enough sample set that similar analyses could be attempted? RfA springs to mind, but I'm not sure the numbers are there; it would be interesting to see if sentiments have changed over the years. ~ Amory (utc) 14:39, 30 June 2018 (UTC)[reply]
The sample size of RfA is so small in recent years (since 2012) that it would not produce any usable results. The only major change in that time is that the RfAs have slowly warped into yet another platform for a lot discussion about the process and adminship in general. RfA remains the Wild West of Wikipedia. Kudpung กุดผึ้ง (talk) 00:35, 3 July 2018 (UTC)[reply]
Whatt other result would be possible except that positive expressions correlate with desires to keep? What classes of arguments for keep are there except that the subject is notable/the article is good/that it does meet policy? Or, for delete, that the subject is not notable/the article is not good/the article does not meet policy? I don't see how any of this could affect judging the quality of closes, especially considering closes aren't supposed to be a mere numerical count of votes. It would identify those closes where the closes did not match the sentiments most expressed, but that's not an indication that the close is bad--in fact, it's the usual situation for AfDs contaminated by single purpose accounts. (and similarly for RfAs) DGG ( talk ) 00:32, 6 July 2018 (UTC)[reply]
One of the many reasons why I think it'd be a Bad Idea™ to do so. Regardless, even though it's unsurprising, it's noteworthy that they can actually detect a difference. ~ Amory (utc) 01:01, 6 July 2018 (UTC)[reply]
  • The review of "On the Self-similarity of Wikipedia Talks: a Combined Discourse-analytical and Quantitative Approach" is unintelligible. (In partial mitigation, I hasten to add that the paper it reviews [1] rates extremely high on the gobbledygook index.) I would have expected the purpose of these reviews to be to give nonspecialist readers at least an inkling of the import of the work reviewed despite (as I will hazard is the common case) prior ignorance of such terms as web genre and dialogue theory; in this it fails spectacularly. And by the way, what does it mean for a paper to be "thoroughly structured"? How do figures that "support and underpin the findings" differ from figures that simply support the findings, or underpin the findings?
Also, I thought a Wikicussion was what you get from beating your head against the wall arguing with someone who just doesn't get it. EEng 14:47, 5 July 2018 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2018-06-29/Recent_research