Wikipedia:Wikipedia Signpost/2010-07-19/From the editors Wikipedia:Wikipedia Signpost/2010-07-19/Traffic report Wikipedia:Wikipedia Signpost/2010-07-19/In the media
The first technology presentations from last weekend's Wikimania conference in Gdansk have begun to be published. They include "Why your extension will not be enabled on Wikimedia wikis in its current state and what you can do about it" (pictured), which gave advice to developers, particularly of extensions. This came in the form of instructions on security and scalability, for example, that all inputs should be escaped to prevent SQL injection. Also published were the slides from "Geodata in Wikipedia and Commons", which outlined how Wikimedia is going about geocoding its images and then utilising this data.
Following on from last week's import of Bibliothèque nationale de France (BNF) images, this week the technical preparation for a mass upload of map images created by the Ordnance Survey (OS), the government body responsible for mapping in the United Kingdom. The files, provided by the OS as part of their OpenData initiative under their own free licence, will be uploaded to Wikimedia Commons by OrdnanceSurveyBot in both native TIFF form and JPG form for easier use and display across Wikimedia Projects.
Unlike with the BNF, however, the release was not the result of a specific partnership, but of years of campaigning by the wider open data community in the UK. The Ordnance Survey website notes that the selection of maps it agreed to release to the public, under a free licence on April 1 2010, represents some "of the most detailed mapping datasets available for Great Britain". The free licence is compatible with the Creative Commons 3.0 licence; this means that all derivative works can be licensed as CC.
Note: not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
Wikipedia:Wikipedia Signpost/2010-07-19/Essay Wikipedia:Wikipedia Signpost/2010-07-19/Opinion
Following an article in the Sunday Telegraph (see last week's Signpost), British politician Tony Baldry has "strongly defended his decision to make changes to his Wikipedia biography, saying information posted on the web-based encyclopedia was inaccurate and libellous", according to the Banbury Guardian.
Tony Baldry told the tabloid newspaper that "in the run up to the General Election, I was made aware that an anonymous blogger had gone on to Wikipedia and made a number of entries relating to me which were inaccurate, false and defamatory. I asked one of my team to go on to Wikipedia and do no more than make sure the entries were factually correct. My researcher went on under the name of Tony Baldry, you can't get more transparent than that. It was completely clear that the relevant amendments were being requested by or on my behalf."
The contributions in question are listed under Special:Contributions/Tonybaldry. However, it appears that a sockpuppet investigation found he was simultaneously using another account, User:Panther219 to make edits to his page.
In a presentation at the recent Wikimania conference, a representative of Google described how, for the past 16 months, "Google has been working with the Wikimedia Foundation, students, professors, Google volunteers, paid translators, and members of the Wikipedia community to increase Wikipedia content in Arabic, Indic languages, and Swahili". (See also earlier Signpost coverage of Google's "Kiswahili Wikipedia Challenge")
Stephen Shankland of cnet wrote on his blog that "Google's mission is to organize the world's information and make it universally accessible, but not necessarily to create it outright. This makes Wikipedia a natural partner." Google plans to expand its services with Arabic, Indic languages, and Swahili. These are languages in which there is no large corpus of material on the web, particularly in Unicode. The availability of such material would help in the training of the translator. Google's "Translation Toolkit" has aided in the translation project, and has received accuracy improvements as a result.
In a statement posted on Wednesday, Google said:
To help Wikipedia become more helpful to speakers of smaller languages, we’re working with volunteers, translators and Wikipedians across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu. We began these efforts in 2008, starting with translating Wikipedia articles into Hindi, a language spoken by tens of millions of Internet users. At that time the Hindi Wikipedia had only 3.4 million words across 21,000 articles––while in contrast, the English Wikipedia had 1.3 billion words across 2.5 million articles. We selected the Wikipedia articles using a couple of different sets of criteria. First, we used Google search data to determine the most popular English Wikipedia articles read in India. Using Google Trends, we found the articles that were consistently read over time––and not just temporarily popular. Finally we used Translator Toolkit to translate articles that either did not exist or were placeholder articles or “stubs” in Hindi Wikipedia. In three months, we used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent. We’ve since repeated this process for other languages, to bring our total number of words translated to 16 million.
— Google, "Translating Wikipedia"
In another Wikimania presentation, immediately following that of Google, A. Ravishankar from the Tamil Wikipedia presented a critical view of Google's activities on that project. The concerns described by Ravishankar (and noted in the New York Times) included the fact that Google did not announce its activities beforehand - "the site’s administrators suddenly noticed articles appearing out of nowhere", the selection of coverage, and "sloppiness in language and coding". A page on the Tamil Wikipedia describes further "Issues with google translation in Tamil Wikipedia" (in English). On the Bengali Wikipedia, content provided by Google was even deleted outright because it did not meet the community's standards.
Recently, several top contributors on the Acehnese Wikipedia added a template of protest to their userpages, and for a time to the main page of their wiki (see detailed timeline). It called (in English) for the immediate deletion of "images insulting the Prophet Muhammad PBUH" (for context, see depictions of Muhammad and Signpost coverage: 2006, 2008) from four pages on the English Wikipedia and added that the wiki would be prepared for a "boycott" of Wikipedia if a fatwa were issued on the topic. A few days later, it was noted on the Foundation's mailing list and caused a great deal of debate (about 120 messages to date). While there was some sympathy for the contributors – the Arabic Wikipedia already restricts the display of images of Muhammad by local consensus – there was general disapproval of the militancy with which the message was spread. It was noted that while the contributors had the right to fork the wiki, they could not unilaterally shut it down. After two stewards and a global sysop intervened to remove the template from the main page, they were blocked by local admins and the message was reinserted. Eventually all local admins were de-sysopped. Discussion is occurring at meta.
The Acehnese language is primarily spoken in the Indonesian province of Aceh. The Acehnese Wikipedia was started in August 2009 (after a period of incubation beginning in 2008), and a presentation at the recent Wikimania conference described its community as "very small and limited. Less than ten contributors are really active ... Almost all active contributors don’t have internet access at their home."
Wikipedia:Wikipedia Signpost/2010-07-19/Serendipity Wikipedia:Wikipedia Signpost/2010-07-19/Op-ed Wikipedia:Wikipedia Signpost/2010-07-19/In focus
The Arbitration Committee opened no cases this week, leaving two open.
Wikipedia:Wikipedia Signpost/2010-07-19/Humour