The Signpost

Op-ed

Licensed for reuse? Citing open-access sources in Wikipedia articles

The views expressed in this op-ed are those of the author only; responses and critical commentary are invited in the comments section. The Signpost welcomes proposals for op-eds at our opinion desk.
This image of Xanthichthys ringens is sourced from an open-access scholarly article licensed for re-use. Should we make that reusability explicit when citing this source in Wikipedia articles?[1]

It is heavily ironic that two decades after the World Wide Web was started—largely to make it easier to share scholarly research—most of our past and present research publications are still hidden behind paywalls for private profit. The bitter twist is that the vast majority of this research is publicly funded, to the tune of hundreds of billions of dollars worldwide each year.

This has placed Wikipedia in an awkward position with respect to its verifiability policy: "all material in Wikipedia mainspace, including everything in articles, lists and captions, must be verifiable [so that] people reading and editing the encyclopedia can check that the information comes from a reliable source." Combined with the policy on identifying reliable sources, the paywall dilemma faced by editors and readers becomes clearer: "many Wikipedia articles rely on scholarly material. When available, academic and peer-reviewed publications, scholarly monographs, and textbooks are usually the most reliable sources." Not only this, none of the academic journals most cited on the English Wikipedia are open access (PLOS ONE breaks the drought at No. 22 on that list).

While WP:PAYWALL advises: "Do not reject sources just because they are hard or costly to access". Commenting on a draft proposal that Wikipedia articles should preferentially cite open-access literature, one editor wrote that "verifiability isn't an option if people are expected to pay in excess of $20 to view a single article ... over closed- or toll-access resources of equivalent scholarly quality". That draft proposal—started in 2007 when the English Wikipedia was half its current age—died quietly like so many.

But what if we could just mark references as being open, rather than preferentially citing them over closed ones? WikiProject Open Access is currently exploring the options, and the Workgroup on Open Access Metadata and Indicators (OAMI) at the National Information Standards Organization has been working on a set of recommendations for how to provide information about the use and re-use rights of scholarly articles. A draft version was released last week, and public comments are invited until 4 February.

These recommendations boil down to two metadata tags:

The recommendations don't include:

Similar recommendations have been put forward in a more broadly scoped draft report from Jisc, the UK body that supports senior-high-school and higher education. The draft had been was released for public comment in September, and its final version is still being worked on. A related report from the Confederation of Open Access Repositories looked at components of license clauses in use by scholarly publishers.

One of the organisations involved in the NISO Workgroup is CrossRef, which is working on including the proposed tags into their metadata and making that information available through their API, in collaboration with the Directory of Open Access Journals. The Open Article Gauge, developed by Cottage Labs with support from the Public Library of Science (PLOS), already provides article-level information about licensing terms for a subset of the scholarly literature; PLOS has signalled an interest in implementing a system that would provide licensing information for references cited in articles published in its journals, which are among the most well-known open-access journals.

The NISO document contains a scenario quite similar to searching for illustrations for use in Wikipedia articles:

The reference 1 (broken in the NISO document) refers to the November 2012 open-access report (part of the Wikimedia GLAM newsletter), which lists examples of such conflicting licensing statements and served as the basis for a more detailed analysis published and presented last October.

The icon used to signal the Attribution module in Creative Commons licenses.

It is the potential for these kinds of incongruencies that motivated the NISO group to opt for signalling only the stable home (the URI) of the licensing terms and not individual use and re-use rights. Many publishers use licensing terms incompatible with Creative Commons licenses, and to understand their implications, Wikipedia users might need legal assistance; this makes it difficult to see how signalling those terms (other than perhaps by way of {{closed access}} or {{subscription required}}) would incur any benefit to those users.

The case is different for Creative Commons licenses: their URI (e.g. http://creativecommons.org/licenses/by/4.0/) already signals re-use rights, making it easy to implement the <license_ref>, while their corresponding <free_to_read> tag can always be set to "yes", and compatibility with the NISO recommendations would be ensured.

On Wikimedia sites, a number of external link icons are already in use that act on certain elements of a URI—for example, a lock icon for HTTPS, as in https://www.eff.org/copyrightweek (which is this week, a period of action around copyright, organised by the Electronic Frontier Foundation). So having the CC BY icon displayed right next to external links that contain the string "http://creativecommons.org/licenses/by/" would be straightforward. Once the licensing information is available via the CrossRef API, a link to the appropriate CC URI could be added automatically to template-based references (e.g. by way of Citation bot, which was migrated to Wikimedia Labs last weekend).

Since Wikidata has enabled phase I support for Wikisource on Tuesday, it would even be possible to link to the full text available from Wikisource (see also the Wikisource vision) and to the corresponding Wikidata entry, as demonstrated in the reference. Of course, there is room to economise on space, such as by linking the icons directly rather than adjacent text bits, and if the article is covered on other Wikimedia platforms (e.g. Wikiquote, Wikinews, Wikispecies), the corresponding links could be included as well.

Currently, Wikidata items can be created for sources supporting statements on Wikidata, but the details of whether and how other sources (e.g. those supporting statements in a Wikipedia or Wikibooks page) are to be handled—or whether Citation bot should be ported to Wikidata—remain yet to be worked out. Two taskforces have been created to work on this: one for books and one for periodicals.

Irrespective of the details, I think that if Wikipedia articles were to signal the openness of scholarly references they cite, this would go a long way towards raising awareness of open licensing among users of Wikimedia content, amplifying similar efforts by open-access publishers and even Google, whose image search by re-use rights (available since 2009) was simplified this week.


Another image that anyone is allowed to freely reuse, revise, remix, and redistribute for any purpose: Prognathodes aculeatus, out of a total of 202 files on Wikimedia Commons from the same source.[1]

References

  1. ^ a b Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLOS ONE. 5 (5): e10676. Bibcode:2010PLoSO...510676W. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760. CC0 full text media metadata
+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • I agree that Wikipedia articles should signal the openness of references they cite. It doesn't really matter, though, because I still think books are better sources, even though they are practically never open access. The need for the source to convey the desired information should always control, with ease of access a secondary concern. (With most information, an open access source would convey just as much needed information as any other.) And also please be mindful of the UNIX wars when it comes to these so-called "APIs", and the potential downsides of well designed, well implemented, sorely needed interfaces that are nonetheless inferior to other interfaces (possibly due to issues completely unrelated to the use case). I see this as a particular problem with organizations based in the United States, where incompatible, proprietary APIs (usually carried over HTTP and JSON) are commonplace and not seen as a problem, essentially like what happened during the UNIX wars (where APIs were carried over libc and CPU software interrupts but were nonetheless largely incompatible). Int21h (talk) 00:29, 20 January 2014 (UTC)[reply]
  • The problem is that very few review articles are published in open access journals. Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:22, 20 January 2014 (UTC)[reply]
  • Actually, there is a common convention (esp. medical articles) where the article title is hyperlinked to the online paper if and only if it is freely available. If the article is behind a paywall, then we still have DOI and PMID and the full citation for those with access to locate the paper in a couple of clicks. The difference here is that our readers only care that the paper is freely available, not whether the journal is Open Access. This also helps support those journals who make-free papers after a period of time, even if the content is not released under CC, say. This doesn't require any special tags or templates. -- Colin°Talk 08:27, 20 January 2014 (UTC)[reply]
  • While I agree that it is frustrating (as a lay editor without good journal access) to click on a source only to be demanded $30 for 6 sheets of A4, we need to remember that nearly all good quality sources cost money. Professional quality books are not cheap to publish nor to buy. They suffer from the same issue that Doc James mentions with reviews articles (which we want to use rather than the primary research papers, which we usually don't want to use). What is the incentive for an author of a review or textbook to not only give their work away but actually to be charged a large amount of money to do so? Can someone explain to me how Epilepsy: A Comprehensive Textbook can be published Open Access? I would be interested to know if Open Access is generally only for primary research papers. Because if that is the case, then it has very little to offer Wikipedia as a provider of good source material. -- Colin°Talk 08:27, 20 January 2014 (UTC)[reply]
Colin, open access primarily refers to academic papers, but there is also talk about applying the concept to other things like textbooks. Other terms, like open educational resources, are also applied to textbooks, but in any case, the proposal is to remove cost as a barrier to accessing the publication. This conversation started with academic papers because they have always been given away by the authors not receiving pay from the publishers to whom they freely gifted their writing to be sold commercially. Textbooks, in contrast, have not traditionally been written for free.
Open access is not just for primary research papers, and since I know you are interested in medicine, perhaps you should be aware of the NIH Public Access Policy which says that persons who receive US Federal Government Funding to do research must apply open access licensing to their research papers. This policy is making a range of papers open.
You asked about the incentive to write a textbook if it is to be given away. Open access advocates want to encourage a marketplace which produces and develops only textbooks better than exist now, and of course people must be paid to do this. The issue to be discussed is whether commercial marketing of textbooks is the best way to fund their production, or whether there might be an alternative funding model to grant access to the books without cost and still get the authors paid somehow. There are lots of proposals. In the case of the NIH policy, the argument is that if taxpayers in America fund research then commercial entities ought not receive this for free and then sell it to taxpayer-funded libraries and especially not if they have a high profit margin and there are other ways to manage this. Blue Rasberry (talk) 23:12, 20 January 2014 (UTC)[reply]
I can see the goverment funded research -> publish open model can work. But the book I linked isn't research and although you say people are looking at alternative models, it doesn't sound like anyone has found one yet. In these days of "austerity measures" I'm not optimistic government will step in to pay for the publication of dense academic material a small number of people might want to read, as opposed to publishers getting paid when people actually buy a book. It would probably be more cost-effective to pay for a few productive Wikipedians to read that material and make it accessible to a wider audience. I'm all for imaginative new ideas for publishing material and for lowering costs esp once we don't need physical printing. But we also need to ensure the traditional "editorial standards" are kept and that it doesn't become the academic equivalent of vanity publishing or blogs where anyone can publish any nonsense. Btw, although the researcher wasn't traditionally paid for their paper, publication of the research is the lifeblood of any researcher, so their career depends on it. One could say they are "paid" ever time someone cites their paper. -- Colin°Talk 08:51, 21 January 2014 (UTC)[reply]
Thanks Colin. I agree that maintaining or raising editorial standards is a prime concern. Perhaps we can talk more sometime. Blue Rasberry (talk) 16:34, 21 January 2014 (UTC)[reply]
  • The issues raised so far are focusing largely on text-based information, while I tried to highlight the case for reusability more generally, including of non-text materials. Once you look at the latter, there is no principal difference between primary and secondary sources any more, nor between books or journal articles: all of them may contain images or multimedia files of use for Wikipedia articles. I think that if we are citing a source and know that it has given rise to a number of files on Commons, it would make sense to add that information to the citation. From there, it is only a small step to displaying license information, which would be helpful, for instance, to those who are looking for illustrations (e.g. for courseware, a conference talk, and open textbook or another Wikipedia article). It could also help raise awareness of the importance of licensing amongst readers of those articles, which for scholarly topics likely includes a good number of scholars (often from other fields) and students. For OA business models, see here for journals and here for books. -- Daniel Mietchen (talk) 23:56, 20 January 2014 (UTC)[reply]
  • As an open-access project, Wikipedia should encourage the use of other open-access sources in Wikipedia articles, such as through compiling lists of links to these sources to bring them to the attention of editors. But we should not have a policy where we encourage the preference of open-access sources (or any type of sources) for reasons other than the quality and reliability of those sources. Too often, many Wikipedia editors are reluctant to dig any deeper for sources than what they can Google up, and a small subset of them even react violently to the use of a source they can't instantly access. An openly stated preference for open-access sources would only encourage this disturbing trend. Instead of lamenting the quality sources we can't access, we should encourage editors to pursue the means they can access those sources: their local public or campus library, Wikipedia:WikiProject Resource Exchange/Resource Request, and donated database accounts (see Wikipedia:The_Wikipedia_Library). Gamaliel (talk) 18:23, 21 January 2014 (UTC)[reply]
On that I disagree. Kinda. I think. It depends on what you mean by policy. Open-access sources should be encouraged for reasons of verifiability--which is a reason "other than the quality and reliability". Sources that one "can't instantly access" negatively affects verifiability, and as such, negatively affects the quality of Wikipedia. Otherwise, I think we're going to have to come to some common understanding on how many thousands of dollars readers/editors should have to spend, and/or how many thousands of miles they should have to travel, to give effect to WP:VERIFY. Keeping in mind readers/editors could be anywhere on Earth--and beyond. Int21h (talk) 03:32, 22 January 2014 (UTC)[reply]
On sources that are completely equal wrt reliability and bias then freely-accessible is a bonus point that should be encouraged. But typically they are not. Encouraging freely accessible sources can actually introduce bias -- for example, when some newspapers are freely accessible but others aren't. Many of our high-quality articles on "serious" topics are sourced to books, and I wouldn't want people complaining they fail WP:V because they should use BBC Online instead. I should note that some readers have better libraries than others (my local library is useless) and not everyone is a student or academic with access to university libraries. So sourcing to pay-for media is a significant barrier for many editors. -- Colin°Talk 09:42, 22 January 2014 (UTC)[reply]
Verification should never require travel or the expenditure of significant sums of money. In the vast majority of cases, all that is required is getting a library card, or filling out an interlibrary loan request, or making a post on Wikipedia:WikiProject Resource Exchange/Resource Request. We should encourage editors to take those steps instead of worrying about hypotheticals. Any verification that requires significant travel or spending is likely a matter for professional scientists and historians and not amateur encyclopedia authors. Gamaliel (talk) 18:48, 22 January 2014 (UTC)[reply]
Let me guess... you're American. In the UK, I can pay anywhere from £4.50 to £15 to borrow a book through interlibrary loan and wait 6-8 weeks for it to arrive and then be asked to return it soon after. And if the item isn't available (perhaps a reference work not for loan) then I can still be charged for the search. Nobody is going to go through such a process unless they are serious about editing the article, not just verifying one fact. I'm afraid your views on verification not requiring significant travel, time or expenditure are not held by any policy and neither should they. -- Colin°Talk 20:21, 22 January 2014 (UTC)[reply]
Okay, I deserved that. I really should know better, being a librarian and having recently read a novel in which the main character is a frequent user of interlibrary loan in the UK. But there are other avenues to pursue for verification. And not every editor is going to be able to verify every citation, and there's nothing wrong with that. The alternative is much more unpleasant, that, as you said, we substitute BBC Online for books as sources. No one would take Wikipedia seriously at that point. Gamaliel (talk) 22:19, 22 January 2014 (UTC)[reply]
I'm a bit late to the party, but the claim "Not only this, none of the academic journals most cited on the English Wikipedia are open access (PLOS ONE breaks the drought at No. 22 on that list)." is misleading at best. Going back to the 15 January 2014 version of the compilation [1], we see that the Journal of Biological Chemistry is at the top of the list, and is a delayed open access journal (12 months embargo). Likewise for #3 PNAS (6 months embargo), #4 Genome Research (6 months embargo), #6 Cell (12 months embargo). Headbomb {talk / contribs / physics / books} 19:09, 11 February 2016 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2014-01-15/Op-ed