The Signpost

News and notes

Bot-created Wikipedia articles covered in the Wall Street Journal, push Cebuano over one million articles

The Philippines has over 120 languages

The Swedish Wikipedia's prolific Lsjbot, which has created a significant proportion of the site's 1.7 million articles and has nearly single-handedly pushed it to being the fourth-largest Wikipedia, was covered in the Wall Street Journal this week.

In its front page article, the US newspaper reported that the bot has created 2.7 million articles, which is apparently a reference to the Waray-Waray and Cebuano Wikipedias (where Lsjbot is also active), and that "on a good day", it creates 10,000 articles.

The Wall Street Journal's article comes as the Cebuano Wikipedia is now the twelfth Wikipedia to cross the million article mark, almost entirely from the boost of these formulaic articles. Of these, over 40% (Swedish, Waray-Waray, Cebuano, Vietnamese, and Dutch) have received significant help from automated article creation scripts. The highest depth of these five is Vietnamese, with 18; Swedish follows with 11, and the others are all under ten. By comparison, the German Wikipedia has a depth of 90.

The process of bot-created articles has proved controversial among Wikimedians; by way of commenting, German Wikipedian Achim Raschka pointed the Signpost to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."

Disagreement with these edits even led to a proposal last year that would have banned the overuse of bot-created articles on Wikimedia projects.

Still, they are not the first Wikipedias to utilize bots to augment human article creators: in 2007, Volapük and Lombard were expanded by over 100,000 bot articles each; Tagalog saw a similar rise. Lombard editors later placed a moratorium on new automated articles and deleted most of them; the Lombard Wikipedia currently has around 31,000 articles. Volapük is hovering around 120,000, and the Tagalog Wikipedia has close to 63,000.

Waray-Waray, Cebuano, and Tagalog are three of the largest languages of the Philippines. Volapük is a 19th-century constructed language from Germany, and Lombard is a Romance language from northern Italy. Vietnamese is primarily limited to Vietnam, while Dutch is spoken in the Netherlands, Belgium, and Suriname.

In brief

Related articles
News and notes

Make your own book with Wikiproject Wikipedia-Books
6 June 2011

ArbCom tally pending; Pediapress renderer; fundraiser update; unreferenced BLP drive
6 December 2010

Fundraisers start for Wikipedia and Citizendium; controversial content and leadership
15 November 2010

Wikipedia books launched worldwide
10 May 2010

Wikipedia-Books: Proposed deletion process extended, cleanup efforts
22 March 2010

New Book namespace created
11 January 2010

35k donated, WikiProject for Wikipedia-Books, 2M rated articles, and more
30 November 2009

Books extension enabled
2 March 2009


More articles


















Wikipedia:Wikipedia Signpost/2014-07-16/News_and_notes