The Signpost

File:CRAIYON-REALESRGAN-Citation bot cleaning up 1 upscale.jpg
Craiyon
PD
0
0
300
Tips and tricks

Cleaning up awful citations with Citation bot

In this column, I will outline some of the possible ways to deal with awful citations on Wikipedia by making use of Citation bot. By 'awful' citations, I don't mean 'awful' in the sense that they are dealing with unreliable sources, but rather in the copy editor's sense, where the information is presented in a reader-unfriendly way. You may be surprised to hear just how little technical skill is required!

Using the bot

The citation expander gadget adds a button to trigger the bot from the edit window! (The exact appearance may differ.)

While the world of bots can be intimidating, making use of Citation bot is actually very simple even if you don't know the first thing about programming. A basic guide is available for readers who want more technical information and less guidance. Here I will focus on the easiest way of using the bot – using the citation expander gadget.

The gadget will add an "Expand citations" options to your sidebar, and a Citations button to your edit window. You can easily enable it in the gadgets tab of your preferences panel. Go to the "Editing" section, and check the box labelled Citation expander.

After that, all you have to do is find a page in need of some love and run the bot. It will automatically try to improve existing citations as best it can. Note that if this is your first time running the bot, you might be asked to grant it permission to make edits on your behalf. You will have to grant these permissions again if you log out or log in on a different machine.

While there are other ways of activating the bot, for the sake of this tutorial, I will assume that you use the Citations button of the citation expander. This will let you review the changes made to the article and make sure everything is in order before saving. If you want to use the web interface or click on "Expand citations" in the sidebar, you will need to save the article after the "cleanup step" before running the bot. The bot will then make its changes automatically, and you will need to review them afterwards. While the first method should always work, the others will only work if the bot is unblocked and is vulnerable to edit conflicts.

Case 1: Accurate citations with limited usefulness

Let's say you come across a citation that's generally accurate, but is missing some information. Something like

Before cleanup
wikitext {{cite journal |last1=West |first1=Jevin D. |last2=Jacquet |first2=Jennifer |last3=King |first3=Molly M. |last4=Correll |first4=Shelley J. |last5=Bergstrom |first5=Carl T. |year=2013 |title=The Role of Gender in Scholarly Authorship |url=https://doi.org/10.1371%2Fjournal.pone.0066212 |journal=PLOS ONE |volume=8 |issue=7 |pages=e66212}}
output West, Jevin D.; Jacquet, Jennifer; King, Molly M.; Correll, Shelley J.; Bergstrom, Carl T. (2013). "The Role of Gender in Scholarly Authorship". PLOS ONE. 8 (7): e66212.

Nothing in this citation is wrong strictly speaking, but it could be made a lot more useful to readers if it contained standard identifiers. Since we already have a DOI url in |url=, we only need to click Citations to run the bot and get

After the bot
wikitext {{cite journal |last1=West |first1=Jevin D. |last2=Jacquet |first2=Jennifer |last3=King |first3=Molly M. |last4=Correll |first4=Shelley J. |last5=Bergstrom |first5=Carl T. |year=2013 |title=The Role of Gender in Scholarly Authorship |url=https://doi.org/10.1371%2Fjournal.pone.0066212 |journal=PLOS ONE |volume=8 |issue=7 |pages=e66212 |arxiv=1211.1759 |bibcode=2013PLoSO...866212W |doi=10.1371/journal.pone.0066212 |pmc=3718784 |pmid=23894278}}
output West, Jevin D.; Jacquet, Jennifer; King, Molly M.; Correll, Shelley J.; Bergstrom, Carl T. (2013). "The Role of Gender in Scholarly Authorship". PLOS ONE. 8 (7): e66212. arXiv:1211.1759. Bibcode:2013PLoSO...866212W. doi:10.1371/journal.pone.0066212. PMC 3718784. PMID 23894278.

And we have a nice, well-formatted citation, with many ways to find the article. Here the bot even went the extra mile and flagged the DOI with |doi-access=free to indicate the DOI will take you to a full free version of the article. This will not always be automatically detected, in which case you can add |doi-access=free yourself.

This method will work with most URLs that point to a standard identifier, or with citations already using standard identifiers: arXiv, Bibcode, DOI, ISBN, JSTOR, PMCID, and PMID. It will also work with URLs to major repositories, like ScienceDirect and Wiley Online Library, although it will not necessarily be as reliable as the DOI method.

While the bot is not guaranteed to find every DOI and PMID out there, running the bot will often cut down a lot of work. So before doing it all yourself, it's a good idea to run the bot first. Not only will it add missing information, it will also cleanup several common mistakes like |last=Smith,|last=Smith, |volume=12(3)|volume=12 |issue=3, or |journal=PLOS GENETICS|journal=PLOS Genetics. You can then focus on cleaning up the things the bot was not able to figure out, like fixing poorly formatted or incomplete citations, or hunting down missing DOIs and JSTOR ids.

Case 2: Poor plain text citations

While WP:CITEVAR is a thing worth keeping in mind, plain text citations are often poorly presented, with typos, and limited usefulness. In those case, WP:CITEVAR does not apply, as no consistent style has been used. Consider for example the following

Before cleanup
wikitext G. Coppola + coauthors (2009). "Sérsic galaxy with Sérsic halo models of early-type galaxies: A TOOL FOR N-BODY SIMUILATION". Publications of the ASP. volume 121-879 pp. 437. {{doi|10.1086/599288}}{{bibcode|2009PASP..121..437C}}
output G. Coppola + coauthors (2009). "Sérsic galaxy with Sérsic halo models of early-type galaxies: A TOOL FOR N-BODY SIMUILATION". Publications of the ASP. volume 121-879 pp. 437. doi:10.1086/599288Bibcode:2009PASP..121..437C

There are several things wrong with this one. Coauthors are not listed (or at least the standard et al. is not used). We have an inconsistently capitalized title, with a typo (SIMUILATION). The journal's name is abbreviated in a non-standard manner, and there are presentation problems with the volume/issue/pages. We also have issues with the presentation of identifiers as well. While it is possible to clean this up by hand, this would take a lot of time.

A much more efficient way of doing the cleanup is to use the bot. We already have identifiers, so the hard part has been done for us; we simply need to feed them to the bot. We can do this in many ways.

Cleanup step
wikitext
method 1
Any of
  • {{cite journal |bibcode=2009PASP..121..437C}}
  • {{cite journal |doi=10.1086/599288}}
  • {{cite journal |bibcode=2009PASP..121..437C |doi=10.1086/599288}}
  • {{cite journal |url=http://adsabs.harvard.edu/abs/2009PASP..121..437C}}
  • {{cite journal |url=https://doi.org/10.1086%2F599288}}
wikitext
method 2
Any of
  • <ref>http://adsabs.harvard.edu/abs/2009PASP..121..437C</ref>
  • <ref>https://doi.org/10.1086%2F599288</ref>

The second method in particular is very easy to use, since you can just right-click on the identifier, and copy paste the URL in the <ref></ref> tags. However it only works within <ref></ref> tags. The first method is more useful for lists of works and bibliographies, but will also work in <ref></ref> tags as long as you don't mind typing a bit more characters and copy-pasting the identifiers (e.g. <ref>{{cite journal |bibcode=2009PASP..121..437C |doi=10.1086/599288}}</ref>).

With whichever method you prefer, you only need to click Citations to run the bot and get

After the bot
wikitext {{cite journal |last1=Coppola |first1=G. |last2=La Barbera |first2=F. |last3=Capaccioli |first3=M. |year=2009 |title=Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool forN-body Simulations |journal=Publications of the Astronomical Society of the Pacific |volume=121 |issue=879 |pages=437–449 |arxiv=0903.4758 |bibcode=2009PASP..121..437C |doi=10.1086/599288 |s2cid=18540590}}
output Coppola, G.; La Barbera, F.; Capaccioli, M. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool forN-body Simulations". Publications of the Astronomical Society of the Pacific. 121 (879): 437–449. arXiv:0903.4758. Bibcode:2009PASP..121..437C. doi:10.1086/599288. S2CID 18540590.

This is not perfect, but it is very close to the final desired version. We only need to do a bit of retouching (A Tool forN-body SimulationsA Tool for ''N''-body Simulations), and optionally add |doi-access=free since the DOI link takes us to a free full version of the article, to get

Final text
output Coppola, G.; La Barbera, F.; Capaccioli, M. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool for N-body Simulations". Publications of the Astronomical Society of the Pacific. 121 (879): 437–449. arXiv:0903.4758. Bibcode:2009PASP..121..437C. doi:10.1086/599288. S2CID 18540590.

If the citation was as poorly formatted as the initial version, it is very likely that WP:CITEVAR does not apply. However, if this was found in a featured article, and was the only poorly formatted citation in an otherwise excellent reference section, returning to a plain text citation is very straightforward. Simply copy-paste the output you get when previewing, with minor modifications

Final wikitext, if WP:CITEVAR applies
wikitext Coppola, G.; La Barbera, F.; Capaccioli, M. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool for ''N''-body Simulations". ''Publications of the Astronomical Society of the Pacific''. '''121''' (879): 437–449. {{arXiv|0903.4758}}. {{bibcode|2009PASP..121..437C}}. {{doi|10.1086/599288}}. {{s2cid|18540590}}.

Case 3: Inaccurate citations

What if you come across a citation you know is just wildly inaccurate, or just so outrageously formatted that things barely make any sense?

Before cleanup
wikitext {{cite journal |last1=Ar≥≥on;Jacobin |first1=New Scientist |year=2018 |title=Deflector Selector says nuke asteroids |journal=Elsevier ScienceDirect |volume=pages3165|issue=3165 |pages=6 |bibcode=2018NewSc.237....6A |doi=10.1016/S0262-4079(18)30281-1}}
output Ar≥≥on;Jacobin, New Scientist (2018). "Deflector Selector says nuke asteroids". Elsevier ScienceDirect. pages3165 (3165): 6. Bibcode:2014PhT....67d..48W. doi:10.1016/S0262-4079(18)30281-1.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Fixing this manually is possible. But you would need to look up the original and pretty much reformat the whole thing. When something is this badly broken, it is good to make sure that whatever is intended to be cited is actually the thing being cited. This could easily be the result of vandalism that resulted in two citations being merged together inadequately. Here, Bibcode:2014PhT....67d..48W and doi:10.1016/S0262-4079(18)30281-1 don't even point to the same citation! However, once you determine which is the correct citation, we can follow the TNT principle as applied to this particular citation: Blow it up and start over!

Let's assume that doi:10.1016/S0262-4079(18)30281-1 is the correct citation. We can then TNT it back to a state that the bot can make sense of

Cleanup step
wikitext {{cite journal |doi=10.1016/S0262-4079(18)30281-1}}

and then we only need to click Citations to run the bot and get

After the bot
wikitext {{cite journal |last1=Aron |first1=Jacob |year=2018 |title=Deflector Selector says nuke asteroids |journal=New Scientist |volume=237 |issue=3165 |pages=6 |bibcode=2018NewSc.237....6A |doi=10.1016/S0262-4079(18)30281-1}}
output Aron, Jacob (2018). "Deflector Selector says nuke asteroids". New Scientist. 237 (3165): 6. Bibcode:2018NewSc.237....6A. doi:10.1016/S0262-4079(18)30281-1.

Dealing with the bot's imperfections

Sometimes the metadata available to the bot isn't perfect, and the bot will mess up something that it can't know is wrong. If the bot keeps messing up something after you've fixed it, for example if it keeps adding a |series= to a {{cite journal}} template, you can bypass the bot by putting a comment in the problematic parameter

  • {{cite journal ... |series=<!--Deny citation bot, Journal of Physics is not a book series!-->}}

this will let the bot know it shouldn't try to touch that specific parameter. Likewise if it incorrectly converts a {{cite journal}} to a {{cite book}}, you can put a comment in the template's name

  • {{cite journal <!--Deny citation bot, Journal of Physics is not a book--> |last=... }}

this will let the bot know it shouldn't try to touch that citation at all.

You can report bugs and issues at the bot's talk page. You can also suggest improvements to the bot if you have some ideas.

Dealing with timeouts

An alternative way of running the bot is to use the "Expand citations" link in the sidebar

The more citations an article has, the longer the bot will need to process the article. If an article only has a few citations, the bot will usually deal with it within a minute. If an article has several hundred citations, the bot can take several minutes to process it, and can even timeout. If that happens, you can edit the article section-by-section to give the bot fewer citations to process at once. You can alternatively click on "Expand citations" in the sidebar instead (see picture on the right), and the bot will eventually make an edit. You might be given a timeout screen, but just come back to the article after an hour and you should see a successful edit. If an edit conflict occurred, just run the bot again.

The "Expand citations" link is also a good way to finish your editing session. Just let the bot run, and go do something else. When you return, the bot will have made an edit if it found something to cleanup, and you can continue from there.

Final remarks

While Citation bot is not perfect and doesn't fix everything, correctly used it is a very powerful tool that can save you a ton of headaches and make your editing experience that much easier. I gave examples above using {{cite journal}}, but the bot will also work with {{cite book}}, {{cite web}} and many others (including {{citation}}). I focused on cleanup in this Tips and Tricks column, but you can easily use these methods to add citations to articles. Simply find a good identifier, put it in a citation template (or a plain URL in <ref></ref> tags), and unleash the bot!

I'll note as a disclaimer that I've re-ordered certain parameters to make my examples more legible and more understandable to the reader. In the wild, citation parameter order will depend a lot on what input is given to the bot and which parameters are already present in a citation.

Happy editing!


Tips and Tricks is a general editing advice column written by experienced editors. If you have suggestions for a topic, or want to submit your own advice, follow these links and let us know (or comment below)!

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • @Le Marteau: Glad you've found it useful! It truly is a time saver and a wonderful tool. It's not perfect, but it gets you 95-98% of the way, and saves you so many headaches. You can focus on content/accuracy instead of manually entering citations and making silly little mistakes that you won't catch because your mind is tired of looking at half a zillion citations. Does J. Phys. Chem. refer to the Journal of Physics and Chemistry or the Journal of Physical Chemistry? Let the bot figure it out! Headbomb {t · c · p · b} 08:27, 7 August 2022 (UTC)[reply]
  • Thanks very much for this, Headbomb! Graham (talk) 07:18, 1 September 2022 (UTC)[reply]
  • The “case 1” example is better before the “improvements” which only add illegible strings of numbers and letters that anyone who cares could find with a few seconds of effort, but fill the article bibliography with a massive amount of distracting visual clutter. Anyone who cares about "bibcode", "s2cid", "pmc", "pmid", "mr", "isbn", etc. etc. already knows how to look them up, and people who don’t care about them are poorly served by having to hunt past them looking for the actual content of the citation. For an open access paper like this, just one link is already entirely sufficient; for a non-open-access paper a single preprint link can be a big help. But adding every conceivable citation index identifier to every citation is ridiculous. –jacobolus (t) 14:27, 26 September 2022 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2022-08-01/Tips_and_tricks