On the Wikimedia Techblog, contractor Chad Horohoe announced the first Wikimedia "hack-a-ton", an event when developers, amateur and professional, get together with the explicit aim of bug-fixing and generally getting "down and dirty with the code". Designed to act as a counterpoint to the "MediaWiki Developers' Meetup" in Berlin, which is focused on demonstrations, workshops and small group discussions, the event is scheduled for October 22–24 in Washington DC. Bugs for the weekend are going to be tracked using a new keyword in Bugzilla, "bugsmash". MediaWiki has around 4900 bugs and feature requests outstanding from a total pool of around 25000, though not all relate to the core MediaWiki software.
We continue a series of articles about this year's Google Summer of Code (GSoC) with Samuel Lampa, a biotechnology student at Uppsala University, who describes his project to develop a system for the general import and export of RDF metadata from the Semantic MediaWiki software.
“ | Some of you might know Semantic MediaWiki, the MediaWiki extension that (if installed, which is not currently the case on Wikimedia wikis) lets users annotate facts in articles with a special syntax, which makes them "machine readable". This allows external software tools to use the facts for powerful stuff like integrating data, querying the data in a bandwidth-saving way, providing powerful search facilities, and so on. For example, on the Stockholm article, one would add: [[is capital of::Sweden]] . Annotations are of course best embedded in templates such as the infobox on the Methane article, where they can make use of the already formatted information without bothering users with additional syntax.
Apart from Wikipedia, MediaWiki is used by numerous organizations and companies for all kinds of knowledge bases. In fields such as construction and engineering there are loads of data available in strictly formalized and standardized document formats that, if stored in Semantic MediaWiki, could be turned into "machine readable", queryable databases, by simply adding semantic annotations in the templates, for example. Now, what if one exposes this data in a standardized format that the rest of the web was using, everyone using the same identifier for "Stockholm" and "Bosch spark plug no 0001"? This would enable connecting all the data available into a big "web of things" instead of "web of documents", which can be much more smartly queried – asking explicitly for "all cities in" "Europe", or "all spark plugs that fits" "Volvo V70", for example, instead of guessing the keyword combination that returns such a document on a search engine like Google. Such a format is already available, and called RDF. Semantic MediaWiki already allows the static export of articles in RDF, but does not allow its import; nor does it provide a method out of the box to select from remote only exactly those pieces of data you want. The RDFIO extension, which I built for the Google Summer of Code, addresses the mentioned gaps by providing ability to import RDF as well as an interface for both the querying and updating of facts via a so-called "SPARQL endpoint" (see here for an example) which external tools can also very easily talk to. This new ability to update semantic facts remotely opens up for some interesting use cases. For example, chemists and biologists using Bioclipse can take their working data and export it to a wiki where their peers can make corrections, before importing it again for further analysis, etc. This workflow is in fact already possible as hinted in this blog post / screencast, and is the focus my current work (progress documented on the blog). For a more technical description as well as download and install instructions, see the RDFIO Extension page. The development, and thoughts behind RDFIO was documented on this blog. |
” |
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
Discuss this story
And I tagged all the pages for article feedback. —I-20the highway 22:37, 20 September 2010 (UTC)[reply]