Wikipedia and collective intelligence; how Wikipedia is tweeted

Recent research

Wikipedia and collective intelligence; how Wikipedia is tweeted

Wikipedia as an example of collective intelligence

An article^[1] in Social Science Computer Review presents an argument that Wikipedia is an example of collective intelligence. It is primarily a theoretical piece, but the author is well-informed about Wikipedia's everyday workings, illustrating the theory with his knowledge of Wikipedia. The article heavily relies on Pierre Lévy's notion of "humanistic collective intelligence". The author argues that Wikipedia displays some key characteristics of a collective intelligence process, such as software optimized for stigmergy (a mechanism of indirect coordination between agents or actions, such as the existence of edit history, talk pages, etc.); distributed cognition (such as existence of bots, and division of tasks between various tools and individuals, facilitating their actions), and possibly, through it is not possible to prove beyond any doubt, emergence (a process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties). The author concludes that Wikipedia thus exemplifies a special kind of collective intelligence, the aforementioned humanistic collective intelligence proposed by Lévy.

#Wikipedia and Twitter

review by Kim Osman

This study from OpenSym '15^[2] analysed 2.5 million tweets, collected over a five-month period on Twitter, that linked to Wikipedia pages. The authors found tweets referencing Wikipedia in both English and Japanese linked to pages from their respective language versions of Wikipedia nearly all the time (97 and 94 percent respectively). However, in other languages, tweets often linked to a different language version of Wikipedia - roughly one fifth of the time. Interestingly, tweets in Indonesian referenced another language version more than half the time (linking to English Wikipedia in half the tweets) and of the links to English Wikipedia the authors found that 75% of linked articles did not have an equivalent Indonesian version. There was a long tail distribution of articles among the analysed tweets, with the authors noting certain “events” (like the Gamergate controversy) generating multiple tweets. Of the Top 20 Twitter users in the dataset, 19 were bots, with the most prolific tweeter being Wikipedia Stub Bot (@wpstubs). The authors do note that in their study there is not enough evidence to support the relationship between “how actively edited a certain article is and its popularity on Twitter.” This study does however raise interesting questions about the platform relationship between Wikipedia and Twitter and the role of bots in creating and maintaining this association. The authors note future research could consider the role of events in popularising Wikipedia articles on Twitter along with further examining motivations for inter-language linking on Twitter.

Briefly

"As of early 2015, the typical edit [on the English Wikipedia] is made by an account that is over 5 years old."

How old is the account making an average edit? Among other charts recently created by Dragons flight to visualize statistical data about the English Wikipedia community, this one shows that "the long-term trend is for the active community to gain about 6 months in average age for every year of time that passes in real life."
Simplifying sentences by finding their equivalent on Simple Wikipedia: A preprint^[3] by researchers at the University of Washington describes a method to automatically align sentences on the English Wikipedia and the Simple English Wikipedia about the same facts. Besides a hand-annotated dataset of corresponding (and non-corresponding) sentence pairs used to test and adjust the algorithm, their approach uses a "novel similarity metric" between of pairs of words which is based on synonym information from Wiktionary, resulting in a weighted graph called "WikNet" that consists of "roughly 177k nodes and 1.15M undirected edges. As expected, our Wiktionary based similarity metric has a higher coverage of 71.8% than WordNet, which has a word coverage of 58.7% in our annotated dataset". These datasets are available online. The following pair of sentences are presented as an example for good match found by the resulting method:
"The castle was later incorporated into the construction of Ashtown Lodge which was to serve as the official residence of the Under Secretary from 1782" (en:Ashtown Castle) vs.
"After the building was made bigger and improved, it was used as the house for the Under Secretary of Ireland from 1782." (simple:Ashtown Castle)

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

"The Virtues of Moderation"^[4] presents "a novel taxonomy of moderation in online communities", including a case study of Wikipedia (p.88).
"Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation"^[5] From the abstract: "We show that using the full graph is more effective than just direct links by a large margin, that non-reciprocal links harm performance, and that there is no benefit from categories and infoboxes ..."
"Wikidata through the Eyes of DBpedia"^[6] From the introduction: "All DBpedia data is extracted from Wikipedia and Wikipedia authors thus unconciously also curate the DBpedia knowledge base. Wikidata on the other hand has its own data curation interface ... While DBpedia covers a very large share of Wikipedia at the expense of partially reduced quality, Wikidata covers a significantly smaller share, but due to the manual curation with higher quality and provenance information."
"WikiMirs: A Mathematical Information Retrieval System for Wikipedia"^[7]
"Content Translation: Computer-assisted translation tool for Wikipedia"^[8]
"Peer-production system or collaborative ontology development effort: what is Wikidata?"^[9] (to be presented at the OpenSym 2015 conference in August)
"Big data and Wikipedia research: social science knowledge across disciplinary divides"^[10]
"Comparing language development in Wikipedia in terms of page views per Internet users"^[11] See also Wiki-research-l mailing list discussion
"Understanding Graph Structure of Wikipedia for Query Expansion"^[12]
"Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research"^[13]
"Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study"^[14]
"Is it Possible to Enhance our Expert Knowledge from Wikipedia?"^[15] From the English-language abstract: "In September 2013 two different questionnaires about medical issues were given to medical students, resident physicians and one medical specialist. The questioning was about diseases/symptoms, examinations/classifications and conservative therapy/surgery of the department of orthopaedics and traumatology. ... The survey has proven the up-to-dateness of Wikipedia articles and their listing on the first or second position on Google. Wikipedia contains a lot of bibliographical references, high-quality images and video material. Almost half (42,5 %) of all evaluated articles are appropriate for use in medical exams and in the daily clinical work."
"Predicting elections from online information flows: towards theoretically informed models"^[16] From the conclusions: "We have shown good evidence that an 'uncertainty effect' drives much Wikipedia traffic: newer parties which attracted a lot of swing voters received disproportionately high levels of Wikipedia traffic. By contrast, there was no evidence of a 'media effect': there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared biased towards different things: with news favouring incumbent parties, whilst Wikipedia favoured new ones." (See also coverage of an earlier preprint by the same authors: "Attempt to use Wikipedia pageviews to predict election results in Iran, Germany and the UK")

References

^ Livingstone, Randall M. (2015-06-26). "Models for Understanding Collective Intelligence on Wikipedia". Social Science Computer Review. 34 (4): 497–508. doi:10.1177/0894439315591136. ISSN 0894-4393. S2CID 60657789.
^ Zangerle, Eva; Schmidhammer, Georg; Specht, Günther (2015). "#Wikipedia on Twitter: Analyzing Tweets about Wikipedia" (PDF). OpenSym '15. doi:10.1145/2788993.2789845. S2CID 5959813.
^ William Hwang, Hannaneh Hajishirzi, Mari Ostendorf, and Wei Wu: Aligning Sentences from Standard Wikipedia to Simple Wikipedia. NAACL-HLT, 2015. PDF
^ James Grimmelmann. "The Virtues of Moderation." Yale Journal of Law and Technology. 17.42 (2015) http://yjolt.org/virtues-moderation
^ Agirre, Eneko; Barrena, Ander; Soroa, Aitor (2015-03-05). "Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation". arXiv:1503.01655 [cs.CL].
^ Ali Ismayilov, Dimitris Kontokostas, Sören Auer, Jens Lehmann, Sebastian Hellmann. "Wikidata through the Eyes of DBpedia". http://arxiv.org/abs/1507.04180
^ Hu, Xuan; Gao, Liangcai; Lin, Xiaoyan; Tang, Zhi; Lin, Xiaofan; Baker, Josef B. (2013). "WikiMirs: A Mathematical Information Retrieval System for Wikipedia". Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. JCDL '13. New York, NY, USA: ACM. pp. 11–20. doi:10.1145/2467696.2467699. ISBN 978-1-4503-2077-1.
^ Laxström, Niklas; Giner, Pau; Thottingal, Santhosh (2015-06-05). "Content Translation: Computer-assisted translation tool for Wikipedia articles". arXiv:1506.01914 [cs.CL].
^ Müller-Birn, Claudia; Karran, Benjamin; Lehmann, Janette; Luczak-Rösch, Markus (2015-05-24). "Peer-production system or collaborative ontology development effort: what is Wikidata?". doi:10.1145/2788993.2789836. S2CID 15126336. OpenSym 2015
^ Schroeder, Ralph; Taylor, Linnet (2015-02-24). "Big data and Wikipedia research: social science knowledge across disciplinary divides". Information, Communication & Society. 18 (9): 1039–1056. doi:10.1080/1369118X.2015.1008538. ISSN 1369-118X. S2CID 144817168.
^ Liao, Han-Teng (2015-03-15). "Comparing language development in Wikipedia in terms of page views per Internet users". Blog of Han-teng Liao, Oxford Internet Institute.
^ Guisado-Gámez, Joan; Prat-Pérez, Arnau (2015-05-06). "Understanding Graph Structure of Wikipedia for Query Expansion". Proceedings of the GRADES'15. pp. 1–6. arXiv:1505.01306. doi:10.1145/2764947.2764953. ISBN 9781450336116. S2CID 8058094.
^ Kennedy, Ryan; Forbush, Eric; Keegan, Brian; Lazer, David (April 2015). "Turning Introductory Comparative Politics and Elections Courses into Social Science Research Communities Using Wikipedia: Improving Both Teaching and Research". PS: Political Science & Politics. 48 (2): 378–384. doi:10.1017/S1049096514002157. ISSN 1537-5935. S2CID 147555546. / Author's copy
^ Pfundner, Alexander; Schönberg, Tobias; Horn, John; Boyce, Richard D; Samwald, Matthias (2015-05-05). "Utilizing the Wikidata System to Improve the Quality of Medical Content in Wikipedia in Diverse Languages: A Pilot Study". Journal of Medical Internet Research. 17 (5): 110. doi:10.2196/jmir.4163. ISSN 1438-8871. PMC 4468594. PMID 25944105.
^ Rechenberg, U.; Josten, C.; Klima, S. (2015). "Is it Possible to Enhance our Expert Knowledge from Wikipedia?". Zeitschrift für Orthopädie und Unfallchirurgie. 153 (2): 171–176. doi:10.1055/s-0034-1396207. ISSN 1864-6743. PMID 25874396. S2CID 196457871. (German, with English abstract)
^ Yasseri, Taha; Bright, Jonathan (2015-05-05). "Wikipedia traffic data and electoral prediction: Towards theoretically informed models". EPJ Data Science. 5 22. arXiv:1505.01818. doi:10.1140/epjds/s13688-016-0083-3. S2CID 256241960.

← Previous "Recent research"

Next "Recent research" →

In this issue

29 July 2015 (all comments)

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Average Account age by year

If we weren't recruiting any new editors the community would be getting a year older each year, so six months means we are still getting new editors. On the face of it this looks very healthy, after our initial period of exponential growth when the average editor had been here less than a year we are now getting a bit more experienced. We are still getting lots and lots of newbies, but with community size broadly flat we are only keeping as many as we lose. My suspicion is that if we combine this with other measures we would find that despite a steady inflow of newbies we are broadly stable, but as we have no way to work out our twenty let alone fifty year retention it is hard to know how healthy this is. Logically a new volunteer endeavour whose founding generation skewed very young will continue to get older and more experienced for decades, at some point if all goes very well indeed, the experience gained annually by the remaining community getting a year older will balance the experience that is dying, retiring or being blocked, and the number of experienced editors lost will match newbies joining. But an organisation barely 14 years old should be decades from such stability, even if we knew editors ages, or which former editors are still alive, or which "newbies" are anything but. But nice work, thanks for doing this. Ϣere SpielChequers 22:19, 3 August 2015 (UTC)[reply]

how charmingly Hegelian. a darker view would be that there needs to be twice as much new editor retention to get to a steady state. this is a long term trend that no amount of happy talk or software change has budged. i don't see any measure of increased productivity of the remainders compensating for the decreased numbers. when the work load and backlogs are growing, it is hard to imagine any reasonable person calling it healthy. actually fixing cultural problems is too hard, and so the community is resigned to the status quo. Duckduckstop (talk) 19:41, 5 August 2015 (UTC)[reply]

Errh, aside from the issue that edit filters, faster vandalism reversion, the move of intrawiki links to wikidata and indeed the rise of wikidata all mean that we can't compare current editing levels to past ones and we don't really know when the true peak was; If you do worry about raw edit count, the first 6 months of this year have all had more editors doing more than 100 edits that month than the same month a year earlier. That doesn't mean there aren't hard cultural problems embedded in the community and in the environment in which we now operate. One, two or three months increase year on year could still be consistent with a long term trend - a single month could easily be down to comparing one month with 5 weekends against one with 4. But 6 months in a row showing an increase is not consistent with the theory that "there needs to be twice as much new editor retention to get to a steady state". Ϣere SpielChequers 19:20, 13 August 2015 (UTC)[reply]

The Signpost is written by editors like you – join in!

Home

About