The Signpost

News and notes

Politician defends editing own article, Google translation, Row about a small Wikipedia

British MP defends content removal from his own Wikipedia biography

Following an article in the Sunday Telegraph (see last week's Signpost), British politician Tony Baldry has "strongly defended his decision to make changes to his Wikipedia biography, saying information posted on the web-based encyclopedia was inaccurate and libellous", according to the Banbury Guardian.

Tony Baldry told the tabloid newspaper that "in the run up to the General Election, I was made aware that an anonymous blogger had gone on to Wikipedia and made a number of entries relating to me which were inaccurate, false and defamatory. I asked one of my team to go on to Wikipedia and do no more than make sure the entries were factually correct. My researcher went on under the name of Tony Baldry, you can't get more transparent than that. It was completely clear that the relevant amendments were being requested by or on my behalf."

The contributions in question are listed under Special:Contributions/Tonybaldry. However, it appears that a sockpuppet investigation found he was simultaneously using another account, User:Panther219 to make edits to his page.

Google uses machine translation to increase content on smaller Wikipedias

In a presentation at the recent Wikimania conference, a representative of Google described how, for the past 16 months, "Google has been working with the Wikimedia Foundation, students, professors, Google volunteers, paid translators, and members of the Wikipedia community to increase Wikipedia content in Arabic, Indic languages, and Swahili". (See also earlier Signpost coverage of Google's "Kiswahili Wikipedia Challenge")

Stephen Shankland of cnet wrote on his blog that "Google's mission is to organize the world's information and make it universally accessible, but not necessarily to create it outright. This makes Wikipedia a natural partner." Google plans to expand its services with Arabic, Indic languages, and Swahili. These are languages in which there is no large corpus of material on the web, particularly in Unicode. The availability of such material would help in the training of the translator. Google's "Translation Toolkit" has aided in the translation project, and has received accuracy improvements as a result.

In a statement posted on Wednesday, Google said:

To help Wikipedia become more helpful to speakers of smaller languages, we’re working with volunteers, translators and Wikipedians across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu. We began these efforts in 2008, starting with translating Wikipedia articles into Hindi, a language spoken by tens of millions of Internet users. At that time the Hindi Wikipedia had only 3.4 million words across 21,000 articles––while in contrast, the English Wikipedia had 1.3 billion words across 2.5 million articles. We selected the Wikipedia articles using a couple of different sets of criteria. First, we used Google search data to determine the most popular English Wikipedia articles read in India. Using Google Trends, we found the articles that were consistently read over time––and not just temporarily popular. Finally we used Translator Toolkit to translate articles that either did not exist or were placeholder articles or “stubs” in Hindi Wikipedia. In three months, we used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent. We’ve since repeated this process for other languages, to bring our total number of words translated to 16 million.

In another Wikimania presentation, immediately following that of Google, A. Ravishankar from the Tamil Wikipedia presented a critical view of Google's activities on that project. The concerns described by Ravishankar (and noted in the New York Times) included the fact that Google did not announce its activities beforehand - "the site’s administrators suddenly noticed articles appearing out of nowhere", the selection of coverage, and "sloppiness in language and coding". A page on the Tamil Wikipedia describes further "Issues with google translation in Tamil Wikipedia" (in English). On the Bengali Wikipedia, content provided by Google was even deleted outright because it did not meet the community's standards.

Acehnese Wikipedians threaten boycott over Muhammad images

Recently, several top contributors on the Acehnese Wikipedia added a template of protest to their userpages, and for a time to the main page of their wiki (see detailed timeline). It called (in English) for the immediate deletion of "images insulting the Prophet Muhammad PBUH" (for context, see depictions of Muhammad and Signpost coverage: 2006, 2008) from four pages on the English Wikipedia and added that the wiki would be prepared for a "boycott" of Wikipedia if a fatwa were issued on the topic. A few days later, it was noted on the Foundation's mailing list and caused a great deal of debate (about 120 messages to date). While there was some sympathy for the contributors – the Arabic Wikipedia already restricts the display of images of Muhammad by local consensus – there was general disapproval of the militancy with which the message was spread. It was noted that while the contributors had the right to fork the wiki, they could not unilaterally shut it down. After two stewards and a global sysop intervened to remove the template from the main page, they were blocked by local admins and the message was reinserted. Eventually all local admins were de-sysopped. Discussion is occurring at meta.

The Acehnese language is primarily spoken in the Indonesian province of Aceh. The Acehnese Wikipedia was started in August 2009 (after a period of incubation beginning in 2008), and a presentation at the recent Wikimania conference described its community as "very small and limited. Less than ten contributors are really active ... Almost all active contributors don’t have internet access at their home."

Briefly


















Wikipedia:Wikipedia Signpost/2010-07-19/News_and_notes