The Signpost

Recent research

Create or curate, cooperate or compete? Game theory for Wikipedia editors


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Cooperative creators "lose ownership" of their Wikipedia contributions, but the community's NPOV governance should still be "kept limited"

"A game-theoretic analysis of Wikipedia's peer production: The interplay between community's governance and contributors' interactions"[1], published earlier this month in PLoS ONE, investigates what it calls fundamental but still unresolved questions about "the way in which governance shapes individual-level interactions" in peer production.

Specifically, the authors build a complex game theory model to answer the following two research questions about English Wikipedia, and validate its predictions empirically by examining the revision histories of 864 articles (up to 2012):

"RQ1: What is the optimal strategy for a contributor who is attempting to balance the costs and benefits of peer production? Namely, how many sentences within an article should be owned by a contributor that is characterized by a certain cooperative/competitive orientation and a particular creator/curator activity profile?
RQ2: How does the community governance mechanism, particularly the attempt to ensure a NPOV, affect the dynamics underlying Wikipedia's co-production process?"

These concepts are operationalized as follows (among other variables used to construct the the quantitative model):

  • "Owned" here refers to having authored text that persists in a Wikipedia article, surviving subsequent revisions (as opposed to ownership in the sense of contributors being entitled to a form of control over the content – a notion that, as the authors discuss, is explicitly discouraged by the community). An editor's utility – i.e. the benefit that they derive from contributing to Wikipedia – is modeled using both their individual "fractional ownership" and the communal "benefit derived from the co-production of high-quality articles".
  • An editor's "creator/curator" orientation is quantitatively represented by a metric that assumes that "Creators are characterized by large-size edits [..]; curators’ edits are smaller".
  • The "cooperative/competitive orientation" was quantified "based on [contributors'] role in the community (those closer to the community’s core are assumed to be more cooperative)." Based on previous research, editors were "organized in four strata, reflecting contributors’ commitment and involvement within the community: unregistered members, registered members, privileged members (holding a special privilege), and core members (i.e. administrators)." The authors theorize that "the greater one’s rights and responsibilities within the Wikipedia community, the more one is considered to have a cooperative orientation. Specifically, we assigned the values of 0.1, 0.4, 0.6, and 0.9 to the different strata (e.g., an administrator is assigned w1 = 0.1, reflecting the least competitive and most cooperative orientation)."
  • Lastly, an editor's cost is quantified as the sum of
    • a. "the effort expended in producing content and editing an article", where the authors "assume that changing an existing sentence requires less effort than originating a new sentence, such that the effort expended by a contributor is a linear function the contributor’s position in the creator-curator continuum" (obviously a simplification, considering that an editor's individual contribution may well vary between adding and changing content).
    • b. "the effort associated with participation in coordination and administrative work, such as editing the articles’ Talk Pages". Again, the authors model this solely based on the editor's overall "cooperative/competitive orientation", reasoning that "This cost element is more applicable for the cooperative contributors".
    • c. "the effort of complying with Wikipedia’s rules and policies, in particular, neutrality-enforcing policies. This effort is intended to regulate only the self-interest activities (i.e., attempting to 'own' article portions) and thus is more applicable to competitive contributors." It is modeled as being proportional to the number of "owned" sentences, multiplied by the editor's competitiveness factor and a variable representing the community's overall "level of neutrality enforcement".
Schema of the model's first level

The game theoretic model consists of two levels:

"[...] the first level models the interactions between individual contributors who seek both cooperative and competitive goals and the second level models governance of co-production as a Stackelberg (leader-follower) game between contributors and the communal neutrality-enforcing mechanisms."

A calculation of the model's Nash equilibrium (basically, a state where no individual "player" can improve their utility by making a unilateral change to their strategy) yields several rather complicated formulae (Theorems 1-4), from which the authors derive various overall conclusions, e.g. that

"[...] the contributor’s characteristics, or more specifically, the ratio between the contributor’s position on the creator-curator continuum and the contributor’s cooperative/competitive orientation is the factor that determines who ends up owning content. When this ratio is smaller than the group’s average, the contributor maintains ownership over portions of the article. Namely, under the governance mechanisms, the fractional content that is eventually owned by a contributor is higher for curators (i.e., with a typical small-size edit per sentence) with a competitive orientation (i.e., peripheral community members). In essence, creators with a cooperative orientation lose ownership of the article. This result was corroborated through [the] empirical analysis.

The authors explain that this means that

"[...]only those with a competitive orientation who choose to act as curators making small edits end up owning significant portion of the content. One might expect that the creators who contribute more content (and in the process exert more effort) would end up owning much of an article’s contents. In contrast, the results of our game-theoretic analysis implies that when competing over content ownership in the presence of Wikipedia’s governance to ensure neutrality, and when controlling for one’s cooperative/competitive-orientation, the creators of content who make on average large contributions would eventually not own any content."

The second "key result" from the game-theoretic analysis is that

"[...] excessive governance should be curtailed, by identifying and maintaining a permissible upper limit, beyond which it discourages contributors from making contributions to an article, bringing the co-production process to a halt. Furthermore, our empirical analysis suggests that a low level of governance is optimal for ensuring neutrality while maintaining articles’ comprehensiveness.

Based on this, the authors recommend that

[T]he community’s efforts to govern content creation and ensure neutrality, although essential for maintaining a balanced position, should be carefully monitored and kept limited. The reason is that when the 'tax' imposed on contributors in terms of complying with NPOV norms, policies and procedures is too high, it outweighs the benefits associated with content ownership, such that contributors stop competing for ownership (and in effect, co-production is stalled)."

See also previous coverage of related research by one of the authors (Ofer Arazy from the University of Haifa)


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Defying easy categorization: Wikipedia as primary, secondary and tertiary resource"

From the abstract:[2]

"This article sets out to explore the different categories of source that Wikipedia could be defined as (primary, secondary or tertiary) alongside the varied ways in which Wikipedia is used, which defy easy categorization, exemplified by a broad-ranging literature review and focusing on the English language Wikipedia. It concludes that Wikipedia cannot easily be categorized in any information category but is defined instead by the ways it is used and interpreted by its users."

(We note with delight the mention of the Twitter feed associated with this research newsletter: "Serendipitous discoveries of relevant research were also made via the WikiResearch Twitter account @WikiResearch, the 'Wiki-research-l' mailing list and the Wikimedia Research biannual reports.")

"Conflict dynamics in collaborative knowledge production. A study of network gatekeeping on Wikipedia"

From the paper:[3]

"We study gatekeeping on Wikipedia by analyzing networks of deletions among editors. Factors considered to explain the emergence of these networks are editors’ information production capability, their range of activity, their communicative ties, their geographical location, and their past interactions. [...]

Highlights: [...]

  • Editors’ past interactions affect gatekeeping conflicts on Wikipedia.
  • Disputes among Wikipedians reflect the lines of historical conflicts to some extent.
  • In- and Outgroups based on editors’ real-world attributes influence gatekeeping."

[...]

We drew our sample of articles from the French and Spanish language versions of Wikipedia, respectively. These are the most likely versions where editors from France or Spain and editors from Algeria or former Gran Colombia engage with one another, since the former colonizer’s language is either an official language or widely spoken in the successor states [...] and editors are drawn toward larger versions of Wikipedia. [...] we identified 111 unique relevant articles for both Algeria and the four successor states of Gran Colombia. We retrieved the complete revision history of these articles up until January 1st, 2019, [... and] were able to attribute each character of an article’s text to its author throughout the entire revision history. With this information we could identify who deleted the contributions of whom.


"Understanding Open Collaboration of Wikipedia Good Articles with Factor Analysis"

From the abstract:[4]

"This research aims at understanding the open collaboration involved in producing Wikipedia Good Articles (GA). [...] We propose an approach that first employs factor analysis to identify editing abilities [of contributors] and then uses these editing abilities scores to distinguish editors. Then, we generate sequence of editors participating in the work process to analyse the patterns of collaboration. Without loss of generality, we use GA of three Wikipedia categories covering two general topics and a science topic to demonstrate our approach. The result shows that we can successfully generate editor abilities and identify different types of editors. Then we observe the sequence of different editor involved in the creation process. For the three GA categories examined, we found that [...] highly scored content-shaping ability editors [tend to be] involved in the later stage of the collaboration process."


"Power Distance and Hierarchization in Organizing Virtual Knowledge Sharing in Wikipedia"

From the abstract:[5]

"The authors posed the question whether and to what extent the differences in the cultural dimension of the power distance [an anthropological concept theorized by Geert Hofstede] are reflected in the functioning of the Wikipedia community. How the hierarchization of the organizational structure may influence the organization of knowledge sharing processes is also studied. The authors selected for the research the Wikipedia language versions which were mostly edited by the communities from homogeneous national cultures. The method used was quantitative analysis of the activity of Wikipedia users intended for establishing the general rules of cooperation, as well as an analysis of the distribution of user rights in the context of the social structure of individual versions. Research has shown that with the rise of the power distance, the power structure is becoming more hierarchical. However, the users with administrative rights and users without administrative rights are equally committed to joint rule-making. At the same time, it was found that in some cultures with a low power distance, the users do not show much attachment to the acquired rights. The opposite dependency was observed in countries with Orthodox and Islamic civilizations. [...]"


"Cultural Dimension of Femininity: Masculinity in Virtual Organizing Knowledge Sharing"

From the abstract:[6]

"The aim of the presented research was to identify the differentiation of the selected language versions of Wikipedia in the cultural dimension of femininity and masculinity. We answer the questions whether these differences are reflected in the functioning of the Wikipedia environment and how this fact may improve organizing cooperation in virtual organizations to enhance knowledge sharing. The method of content analysis and analysis of the register of user activity in several fields of activity were adopted. For quantitative analysis, xTools and PetScan tools for generating statistical data were used. For qualitative analysis, chosen user pages and other public spaces were investigated. The results of the conducted research showed that in feminine cultures the relational dimension of activity in Wikipedia space was more important. Behavioural traits specific to task orientation were more pronounced in masculine cultures. In many language versions of feminine cultures, gender divisions were neither distinguished, nor exposed, thus making them more problematic to identify. [...]"


"Wikipedia as a Space for Collective and Individualistic Knowledge Sharing"

From the abstract:[7]

"The aim of the research presented in this paper was to identify the differentiation of language versions in the cultural dimension of individualism and collectivism. The research was both quantitative and qualitative. The authors selected Wikipedia language versions which were edited mainly by communities from homogeneous national cultures and with a minimum of 200 active users per month. The method used was the content analysis and the analysis of registers of user activity. The authors answer the question of whether these differences are reflected in the functioning of the Wikipedia environment. To answer the raised research question, three hypotheses were formulated. The relationship between the individualism index (IDV) of national cultures from which Wikipedians are recruited and the indicators of activity, while also the degree of regulation of activities in the project were examined. Research has shown that IDV is positively correlated with 1) the number of editions made per page which may be indicative of the greater courage to edit somebody else's text and 2) the ratio of the number of active users to the number of principles and recommendations, which means that actions on Wikipedia are relatively less frequently regulated in individualistic cultures. [...]"

From the "Conclusions" section:

"The results of the conducted research indicate that, with the adopted indicators, it is impossible to unequivocally state the influence of national culture on Wikipedia's organizational culture in terms of the degree of individualism. Of the three assumed hypotheses, two were positively verified and one was rejected. A small number of rules in relation to users and a large number of editions per page positively correlate with the level of the IDV. On the other hand the ratio of anonymous users to registered users is not related to the level of individualism."

References

  1. ^ Anand, Santhanakrishnan; Arazy, Ofer; Mandayam, Narayan; Nov, Oded (2023-05-01). "A game-theoretic analysis of Wikipedia's peer production: The interplay between community's governance and contributors' interactions". PLOS ONE. 18 (5): e0281725. Bibcode:2023PLoSO..1881725A. doi:10.1371/journal.pone.0281725. ISSN 1932-6203. PMC 10150990. PMID 37126492.
  2. ^ Ball, Caroline (2023-03-21). "Defying easy categorization: Wikipedia as primary, secondary and tertiary resource". UKSG Insights. 36 (1): 7. doi:10.1629/uksg.604. ISSN 2048-7754.
  3. ^ Bürger, Moritz; Schlögl, Stephan; Schmid-Petri, Hannah (2023-01-01). "Conflict dynamics in collaborative knowledge production. A study of network gatekeeping on Wikipedia". Social Networks. 72: 13–21. doi:10.1016/j.socnet.2022.08.002. ISSN 0378-8733. S2CID 251892672. Closed access icon
  4. ^ Chou, Huichen; Lin, Donghui; Ishida, Toru (September 2022). "Understanding Open Collaboration of Wikipedia Good Articles with Factor Analysis". Journal of Information & Knowledge Management. 21 (3): 2250030. doi:10.1142/S0219649222500307. ISSN 0219-6492. S2CID 248882372.
  5. ^ Skolik, Sebastian; Karczewska, Anna (2021). Power Distance and Hierarchization in Organizing Virtual Knowledge Sharing in Wikipedia. Coventry, UK: Academic Conferences and Publishing International Limited. pp. 705–715. ISBN 978-1-914587-06-1. Closed access icon
  6. ^ Karczewska, Anna; Kukowska, Katarzyna (2021). Cultural Dimension of Femininity: Masculinity in Virtual Organizing Knowledge Sharing. Coventry, UK: Academic Conferences and Publishing International Limited. pp. 414–422. ISBN 978-1-914587-06-1. Closed access icon
  7. ^ Kukowska, Katarzyna; Skolik, Sebastian (2021). Wikipedia as a Space for Collective and Individualistic Knowledge Sharing. Coventry, UK: Academic Conferences and Publishing International Limited. pp. 459–466. ISBN 978-1-914587-06-1. Closed access icon


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • those closer to the community’s core are assumed to be more cooperative – hmm... – Joe (talk) 09:08, 22 May 2023 (UTC)[reply]
  • I would like to read the Gatekeeping article. Does anyone have a preprint? Sandizer (talk) 15:59, 22 May 2023 (UTC)[reply]
    @Sandizer: It's accessible via The Wikipedia Library. – Rhain (he/him) 23:37, 22 May 2023 (UTC)[reply]
  • I feel like the game theory paper is interesting but it also feels like it could be empirically tested to see if its findings hold up in reality because like Joe Roe I'm not sure their statrting assumptions are correct. Best, Barkeep49 (talk) 16:35, 22 May 2023 (UTC)[reply]
    Well, the authors actually did extensive empirical testing, on a dataset spanning more than a decade of edits (as I already briefly mentioned in the review but should probably have expanded upon). From the abstract:

    We validate the model’s prediction through an empirical analysis, by studying the interactions of 219,811 distinct contributors that co-produced 864 Wikipedia articles over a decade. The analysis and empirical results suggest that the factor that determines who ends up owning content is the ratio between one’s cooperative/competitive orientation (estimated based on whether a core or peripheral community member) and the contributor’s creator/curator activity profile (proxied through average edit size per sentence). Namely, under the governance mechanisms, the fractional content that is eventually owned by a contributor is higher for curators that have a competitive orientation.

    But I guess what you and Joe are concerned about is rather what is often called construct validity - i.e., do these metrics defined in the paper really capture what we would commonly perceive as an editor's cooperativeness vs. competitiveness, say? And that's a valid question here. (Of course all models are wrong and it's virtually always possible to pick on some shortcoming where this kind of modeling doesn't fully reflect reality. One would need to ask whether and how much it affects the overall conclusions.)
    Relatedly, while the paper's data availability statement claims that "All relevant data are within the paper and its Supporting information files," the dataset provided there appears to be very incomplete. For example, it does not seem to contain the information necessary to calculate the creator/curator metric, and only contains data on 16,383 users instead of the aforementioned 219,811 (i.e. consists of 16,384 rows including the header, a suspiciously round number). It is also anonymized, meaning that one can't evaluate construct validity by inspecting the ratings for some concrete editors in the dataset manually.
    Regards, HaeB (talk) 18:16, 23 May 2023 (UTC)[reply]
  • I wonder how good the definition of "owned" is. If I for example say "that's a fair enough copy edit which doesn't change meaning, does the other person "own" the sentence that says exactly the same thing as I added. Talpedia 16:18, 6 June 2023 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2023-05-22/Recent_research