The Signpost

News and notes

US National Archives enshrines Wikipedia in Open Government Plan, plans to upload all holdings to Commons

NARA's logo, created in 2010
David Ferriero, Archivist of the United States

The US National Archives and Record Administration (NARA) has committed to engaging with Wikimedia projects in their newest Open Government Plan. The biannual effort is a roadmap for how the agency will accomplish its goals in the digital age. In the first plan, issued in 2010, Archivist of the United States David Ferriero wrote "the cornerstone of the work that we do every day is the belief that citizens have the right to see, examine, and learn from the records that document the actions of their Government. But in this digital age, we have the opportunity to work and communicate more efficiently, effectively, and in completely new ways."

These "new ways" included reaching out to Wikipedia, starting in 2011 with the hiring of Dominic McDevitt-Parks as a Wikipedian in residence. The position began as a student internship, but McDevitt-Parks has since moved to being a digital content specialist with a specialty in the Wikimedia sites. Ferriero has spoken at multiple Wikimedia events, including the Wikipedia in Higher Education summit in 2011 (see Signpost coverage) and Wikimania 2012 (video; transcript; Signpost coverage). He has been frequently quoted saying varying forms of "if Wikipedia is good enough for the Archivist of the United States, maybe it should be good enough for you."

How has the Wikimedia movement benefited from NARA and McDevitt-Parks' placement? There are three organized projects dedicated to NARA. On Wikisource, NARA has an ongoing initiative that is transcribing US government documents. On Commons, NARA has uploaded over 100,000 images, the most recent of which came a month ago. The English Wikipedia has gone into action with several articles related to images from NARA, such as Desegregation in the United States Marine Corps. The site has benefited with several images uploaded for specific users, such as living Medal of Honor recipients, like Charles H. Coolidge, and the lead images for three US battleship articles: Pennsylvania-class battleship, USS Arizona (BB-39), and South Carolina-class battleship (Editor's note: the author of this article has made significant contributions to the last three pages).

All of that is in the past, though. The Open Government Plan lays out what NARA wants to accomplish in the next two years; but as a general plan it suffers from a lack of specifics. The Signpost contacted McDevitt-Parks to learn what the inclusion of Wikipedia in this plan will mean for the site.

He told us that there is no quantitative target for a total number of image uploads, because NARA plans to upload all of its holdings to Commons. "The records we have uploaded so far contain some of the most high-value holdings (e.g. Ansel Adams, Mathew Brady, war posters)", he said. "However, we are not limiting ourselves to particular collections. Our approach has always been simply to upload as much as possible ... to make them as widely accessible to the public as possible."

To accomplish this, volunteers are working with NARA on a new upload script to port images to Commons; the work in progress is posted on Github. At NARA itself, an API is in development that will make it easier to extract the metadata of the images. Given these efforts, McDevitt-Parks says that they will "allow us to more easily upload all of our existing digitized holdings to Wikimedia Commons and similar third-party platforms, and also that in the future upload to platforms like Commons will be the end of all digitization. Looking at it this way, I would say that in a way all of our digitization efforts are also for upload to Wikimedia Commons."

In the meantime, the special requests process—the first pilot launched by NARA when McDevitt-Parks began his tenure—is still available for Wikipedia editors. In the future, they hope that this ad hoc arrangement can be supplemented with a volunteer citizen scanning program that will be able to "generate greater Wikipedian-initiated digitization."

What do the Vietnamese, Waray-Waray, and Swedish Wikipedias all have in common?

Related articles
News and notes

5, 10, and 15 years ago
31 August 2022

Four billion words and a few numbers
28 December 2021

Progress at Wikipedia Library and Wikijournal of Medicine
28 June 2020

The deprecation of Persondata; RfA – A broken process; Complaints from users on Swedish Wikipedia
3 June 2015

US National Archives enshrines Wikipedia in Open Government Plan, plans to upload all holdings to Commons
25 June 2014

Swedish Wikipedia's millionth article leads to protests; WMF elections—where are all the voters?
19 June 2013

Picture of the Year voting begins; Internet culture covered in Sweden and consulted in Russia; brief news
2 May 2011

Report from the Swedish Wikipedia
21 August 2006


More articles

The Vietnamese and Philippines-based Waray-Waray Wikipedias have crossed the one million article rubicon—the tenth and eleventh to do so. Just like the Swedish Wikipedia, the sites have attained this symbolic milestone with the help of bots, a process that has divided opinions among Wikimedians from several languages. For example, for a previous Signpost article on the topic, German Wikipedian Achim Raschka pointed us to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."

In an email to the Wikimedia-l mailing list, Vietnamese Wikipedian Minh Nguyen wrote that some editors on the site shared similar concerns and were "alarmed" at the sharp uptick in bot-created articles. Yet at the same time, crossing the one million article mark with a high proportion of auto-articles led the community to look at its small size—its roughly 1250 active editors is less than the Catalan Wikipedia, a language with almost 60 million less speakers—and they are taking steps to ease the learning curves of new editors.

The question of active users is even more pertinent for fellow millionaire Waray-Waray, which has just 71 active users. The related Cebuano Wikipedia, which has also embraced bot-created articles and will soon join the million article club, has even fewer.

Meanwhile, the Swedish Wikipedia's article-creation bot has started editing again. The bot's operator told the Signpost that the source code has been rewritten to use the most recent references, though it is currently mostly operating on the Waray-Waray and Cebuano Wikipedias, which will soon also have one million articles. Other Wikipedias, such as Farsi (mostly spoken in Iran), have also expressed an interest in the bot's operation. Why have other Wikipedias not adopted similar processes, aside from those (like the English and German) that have philosophical objections? Lsj believes "it is mostly a matter of whether there is somebody who knows both bots and the target language well enough, and is prepared to devote the time required. Small language versions likely do not have such a person."

This article was updated after publication with information and comments from Minh Nguyen.

In brief

Argentina (flag pictured) has very liberal copyright laws—photographs enter the public domain just 25 years after creation and 20 years after first publication—and has therefore been been hit harder than other countries by the URAA deletions.

















Wikipedia:Wikipedia Signpost/2014-06-25/News_and_notes