The Signpost

Recent research

Wikimedia Commons worth $28.9 billion

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Estimating the Value of Wikimedia Commons

Reviewed by Isaac Johnson

Though Wikimedia projects like Wikipedia are clearly incredibly valuable to people worldwide (e.g., Wikipedia's status as the fifth most popular site worldwide), it has been harder to quantify other facets of this value. Anecdotally, the content from communities like Wikipedia has been incredibly important in the development of natural language processing tools,[supp 1] search engines like Google,[supp 2] and an important resource when making life decisions.[supp 3]

This OpenSym 2018 paper, "What is the Commons Worth? Estimating the Value of Wikimedia Imagery by Observing Downstream Use",[1] attempts to quantify the monetary value of Wikimedia Commons, a peer-produced repository of free-use imagery and video that in part holds the images readers come across on Wikipedia. To do so, the authors pose a counterfactual question: how much would the licensing of this content generate if it operated under a for-profit model such as that of Getty Images? They collect a random dataset of 10,000 images from Commons and do a reverse image-search on them to detect how often they are being used across the internet. The domain of each re-use is then evaluated to determine whether, for instance, it was a commercial entity. Using Getty's licensing model of USD $175 for commercial use and USD $60 for non-commercial use, they extrapolate out how often on average each image is used (and where) to reach a total estimate of USD $28.9 billion for Wikimedia Commons.

While there are interesting discussions to be held about some of the methodological choices that led to their final estimate of USD $28.9 billion for the entirety of Commons – e.g., what is a more reasonable estimate of what proportion of images would be paid for if under license – the general approach and motivation are sound and certainly raise important questions about how we value resources like Wikimedia Commons. This research complements previous estimates of the value of Commons.[supp 4] These are not easy questions, but I'll be excited as more research adds to our understanding of the value of these communities' work.

Cf. earlier coverage: "Estimate for economic benefit of Wikipedia: $50 million by 2006 already"

Briefly

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer

"Web caching evaluation from Wikipedia request statistics"

From the abstract:[2] "We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia caches is to reduce access times for page views and edits. We study the impact of most popular pages on the achievable cache hit rate in comparison to Zipf request distributions and we include daily dynamics in popularity."

"Hacking Academic Collaboration with GLAM Edit-a-thons"

From the abstract:[3] "At MIT, librarians, archivists, writing instructors, and local Wikipedians have collaborated to host several edit-a-thons with the common goals of addressing content gaps on Wikipedia and offering the public and the MIT community (including students, staff, alumni and faculty) new ways to engage with the institute's archives and special collections. [...] This article shares results from MIT's GLAM edit-a-thons, and argues that approaching projects from the perspective of Wikipedia's collaborative culture can enhance other kinds of academic collaboration."

"Connecting Wikipedia and the Archive"

From the abstract:[4] "The described project that was started in 2015, was collaboratively designed by archivists and historians with the La Guardia & Wagner Archives ("the Archives") and LaGuardia Community College's faculty and librarians, and involves beginning college students in the production of a needed public history of the outbreak and impact of HIV/AIDS in New York City. [...] Utilization of a Wikipedia as a non-commercial, public, open access information source also succeeds in raising web traffic, visibility and accessibility for unique and valuable archival collections."

"Nonhuman language agents in online collaborative communities: Comparing Hebrew Wikipedia and Facebook translations"

From the abstract:[5] "This study compared language policies in Hebrew Wikipedia and the Hebrew Facebook translation app. Hebrew Wikipedia designed a strict linguistic guide that promotes a neutral Hebrew register, rejecting both colloquial and high registers, enforced by an algorithm post factum."

"Wikipedia's gaps in coverage: are Wikiprojects a solution? A study of the Cambodian Wikiproject"

From the abstract:[6] "The purpose of this paper is to examine the rather unsuccessful Wikiproject for Cambodia. Despite its lack of success, it is a case that can be used to draw lessons for dealing with the issue of geographical under-representation on Wikipedia as a whole. ... The author takes a broadly qualitative approach to the study of Wikipedia. For this study, the Cambodia Wikiproject main page, as well as the various talk page archives associated with it, was downloaded in November 2016 and subjected to a content analysis. Descriptive statistics are also used when necessary to build the argument. Findings: Wikiproject Cambodia has failed to appreciably improve the coverage of Cambodian topics. This is likely due to its inability to attract for a prolonged period of time a champion able to anchor the project and provide a sense that someone is listening. But the makeup of the project members also suggests that even if a champion could be found, the question of who gets to represent whom remains difficult to deal with. It is unlikely that Cambodia will anytime soon develop a strong community of Wikipedia editors given the economic and social constraints the country imposes on the most of its population."

"Representing Metro Manila on Wikipedia"

From the abstract:[7] "While the Wikipedia article on Manila cannot be classified as promotional, it is clear that much of the city remains invisible in this work. Such a puzzle becomes understandable when we examine the urban studies literature where we find that the spatial logic of the city itself helps conceal much from view, so that what we read on Wikipedia is a view from the islands of privilege rather than the oceans of marginalization that make up much of the city's spatial form. If such a spatial structure is to change, representations such as found on Wikipedia need to be challenged."

"How does communicative memory become cultural memory? Negotiation processes on the Wikipedia talk page in case of the White Rose"

"Wie wird kommunikatives zu kulturellem Gedächtnis? Aushandlungsprozesse auf den Wikipedia-Diskussionsseiten am Beispiel der Weißen Rose" (in German)[8]

From the paper (translated): "Finally, the [talk page comments classfied in] the category of personal attacks are remarkable because of their insignificant quantitative dimension. In the context of the White Rose, there was only a single incident of this kind. On the backdrop of widespread hate attacks on the Internet this finding is notable, considering that the resistance against national socialism has never been uncontroversial."


References

  1. ^ Erickson, Kristofer; Perez, Felix Rodriguez; Perez, Jesus Rodriguez (22 August 2018). "What is the Commons Worth?: Estimating the Value of Wikimedia Imagery by Observing Downstream Use". OpenSym. ACM: 9. doi:10.1145/3233391.3233533.
  2. ^ Hasslinger, G.; Kunbaz, M.; Hasslinger, F.; Bauschert, T. (May 2017). "Web caching evaluation from Wikipedia request statistics". 2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). 2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). pp. 1–6. doi:10.23919/WIOPT.2017.7959873. Closed access icon
    Freely available version: Web Caching Evaluation from Wikipedia Request Statistics, The 2nd Content Caching and Delivery in Wireless Networks Workshop (CCDWN), 2017
  3. ^ Thorndike-Breeze, Rebecca; Suiter, Greta Kuriger (2017-09-29). "Hacking Academic Collaboration with GLAM Edit-a-thons". WikiStudies. 1 (1): 65–95.
  4. ^ Matsuuchi, Ann (2017-09-25). "Connecting Wikipedia and the Archive". WikiStudies. 1 (1): 40–64.
  5. ^ Vaisman, Carmel L.; Gonen, Illan; Pinter, Yuval (2018-03-01). "Nonhuman language agents in online collaborative communities: Comparing Hebrew Wikipedia and Facebook translations". Discourse, Context & Media. 21 (Supplement C): 10–17. doi:10.1016/j.dcm.2017.10.002. ISSN 2211-6958. Closed access icon
  6. ^ Luyt, Brendan (2018-02-01). "Wikipedia's gaps in coverage: are Wikiprojects a solution? A study of the Cambodian Wikiproject". Online Information Review. 42 (2): 238–249. doi:10.1108/OIR-06-2017-0199. ISSN 1468-4527. Closed access icon
  7. ^ Luyt, Brendan (2017-11-30). "Representing Metro Manila on Wikipedia". Online Information Review. 42 (1): 16–27. doi:10.1108/OIR-10-2016-0308. ISSN 1468-4527. Closed access icon
  8. ^ Heinrich, Horst-Alfred; Gilowsky, Julia (2018). "Wie wird kommunikatives zu kulturellem Gedächtnis? Aushandlungsprozesse auf den Wikipedia-Diskussionsseiten am Beispiel der Weißen Rose". (Digitale) Medien und soziale Gedächtnisse. Soziales Gedächtnis, Erinnern und Vergessen – Memory Studies. Springer VS, Wiesbaden. pp. 143–167. ISBN 9783658195120. Closed access icon Google Books preview
Supplementary references:
  1. ^ Iderhoff, Nicolas. "nlp-datasets". GitHub. Retrieved 26 October 2018.
  2. ^ Singhal, Amit. "Introducing the Knowledge Graph: things, not strings". The Keyword. Google. Retrieved 26 October 2018.
  3. ^ Singer, Philipp; Lemmerich, Florian; West, Robert; Zia, Leila; Wulczyn, Ellery; Strohmaier, Markus; Leskovec, Jure (3 April 2017). "Why We Read Wikipedia". International World Wide Web Conferences Steering Committee: 1591–1600. doi:10.1145/3038912.3052716. {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ Heald, Paul J.; Erickson, Kris; Kretschmer, Martin (2015). "The Valuation of Unprotected Works: A Case Study of Public Domain Photographs on Wikipedia". SSRN Electronic Journal. doi:10.2139/ssrn.2560572. ISSN 1556-5068.
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

The research on the value of Wikimedia Commons brings to mind this issue's Alexa essay. What's the commercial value of having such snippets, or the Google Knowledge Graph, also 99.99% powered by volunteer contributions? ☆ Bri (talk) 02:28, 30 October 2018 (UTC)[reply]

  • All companies which reuse Wikipedia's content are accruing a debt to society and owe the public commons. The law is not a guidebook for what corporations should do, but only a set of minimal expectations for behavior which are far below the level of human decency. Even though it is not illegal to take from wiki without giving back, it is wrong behavior which brings shame to any person or organization associated with it. Commons and Wikimedia projects bring value to all people and organizations. Companies which get money from wiki project should give back. The employees, customers, neighbors, and anyone socially connected to anyone reusing free and open content from wiki or anywhere else has a duty to educate, pressure, and expect that anyone who uses the commons should give back to it. Blue Rasberry (talk) 13:23, 31 October 2018 (UTC)[reply]
I have to politely disagree. I have contributed many images to Commons and always understood the CC-BY license to mean what it says when it allows free commercial use. Several of my images have been used that way. Sometimes I have been asked permission, which is courteous, but not necessary. In one case a major publisher had one of my images for sale in their catalog (it was in a paper they published), which is clearly not ok and they removed it on my request. Outside of that and proper attribution I expect nothing in return, nor should any other contributor. It might make sense for the Wikimedia foundation to track commercial use (perhaps with a page to accept reports of such use) and send a friendly fund-raising letter, but that is different from saying commercial users have some duty to support Commons. They don't. That's what free means.--agr (talk)
Measuring the economic value of any free product or free service is often more an exercise in scholarship than a valuation of something tendered, and of course the value of Wikimedia Commons may be more social than economic. But a study that references the waiting time between each use of content as the price of use increases may provide insight into the economic value of Commons, and as the price of use increases the distribution of waiting times would likely follow well known measures in statistics. Then again, twenty-some-odd billion dollars is a good starting point.Tamanoeconomico (talk) 20:47, 21 November 2018 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2018-10-28/Recent_research