Wikipedia:Wikipedia Signpost/2012-07-02/From the editors Wikipedia:Wikipedia Signpost/2012-07-02/Traffic report Wikipedia:Wikipedia Signpost/2012-07-02/In the media
“ | [The Wikimedia Foundation's] strategy is to focus on two areas: [testing] automation; and building a testing community. We’re hiring people to coordinate these two areas. | ” |
—WMF QA Lead Engineer Chris McMahon |
This week a blog post by WMF engineer Chris McMahon put the spotlight on an area that does not often reach the pages of the Signpost: quality assurance (QA), a diverse remit spanning interface testing, process improvement, and project monitoring.
McMahon is currently the only employee of the foundation with specific responsibility for quality assurance; the WMF is currently seeking a volunteer QA coordinator and a QA engineer to work alongside him. Their work will centre not only on discovering defects, McMahon writes, but investigating software to provide valuable information about that software from every point of view [and] examining the process by which the software is created, from design to code to test to release and beyond". If recent experience is anything to go by, McMahon and the two new hires will have their work cut out: many, if not all, Wikipedians can recite a list of bugs that have affected them in the recent past.
What makes QA across MediaWiki (the software that powers Wikimedia wikis) and the day-to-day running of those sites so difficult? "The development process involves so many contributors, with code coming in from so many sources and projects," writes McMahon, who also hints at the problems of being leader rather than follower in the world of rapid website testing. When finished, the processes currently being formulated are "intended to be a reference implementation, an industry standard for high-quality browser test automation".
According to the blog post, the foundation is also cultivating two relationships in the world of QA: the first with crowdsourcing website Weekend Testing; the second with technology non-profit OpenHatch.org, for whom MediaWiki testing constitutes their first foray into the world of software testing (the WMF is also employing OpenHatch in an area closer to its expertise – technology education (previous Signpost coverage). With the WMF QA department still in its infancy, the long-term utility of the measures they are now embarking on are not yet known.
Version 5 of the HTML standard may once again be enabled for use on Wikimedia wikis, well over a year after the first attempt to flick the switch was abandoned almost immediately (see previous Signpost coverage). WMF Director of Platform engineering Rob Lanphier this week expressed renewed interest in the switchover, suggesting a late July date for what would be the second attempt to implement the increasingly common standard (wikitech-l mailing list).
Fundamentally, the change is not a difficult one, requiring only the simple replacement of a single line of code. However, as the Signpost reported in February 2011, changing even that one line has the potential to break any tool reliant on so-called "screen-scraping" – reliant, in other words, on reading a page's HTML rather than a more machine-friendly version, such as that provided by the MediaWiki API. Then, even major tools like Twinkle were vulnerable to such problems; thankfully, all of the big-name tools are now far less reliant on the exact code used to generate the page, and as such will almost certainly survive the switchover. But other less well-maintained tools may not be so lucky, requiring the change to be well-trialed. The other bug raised at the time, relating to citation IDs, looks to have been resolved since, making a July switchover look all the more feasible.
Enabling HTML5 mode signals to browsers that they should display Wikimedia wikis in HTML5 mode, complete (once MediaWiki's own support is improved) with <video>
tags, canvases and native support for form validation. Users should note that certain, long-deprecated markup will cease to function, most notably <font>
and <center>
tags, which are common in user signatures and on user pages, despite not being officially supported by MediaWiki itself.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
{{PAGEID}}
magic word and the recapitalisation of the language names used in the sidebar. A release to external sites including the same selection of bug fixes and new features is not expected for some time.Wikipedia:Wikipedia Signpost/2012-07-02/Essay Wikipedia:Wikipedia Signpost/2012-07-02/Opinion
On June 28, the Wikimedia Foundation started a request for comment (RfC) on whether the community feels the foundation should participate in the Internet Defense League, a proposed lobbying organization with the goal of protesting future anti-piracy legislation.
According to the RfC, the organization to be launched aims to build a network of stakeholders interested in activism against legislation such as SOPA and the PROTECT IP Act. League members would be notified if protests such as the SOPA blackout (Signpost coverage) are proposed, but no one would be bound by their membership to take part in any action. The proposed network is a cooperative effort of Mozilla and Fight for the Future. Organizations such as the Electronic Frontier Foundation, WorldPress, and Reddit have already joined.
The WMF's legal and community advocacy department published an evaluative statement. While the proposal could turn out to be "very valuable", it says, the initiative involves many uncertainties yet to be clarified, and joining such a network might lead to perceptions that Wikimedia projects are becoming more political.
At the time of writing, two users have supported membership of the league, while more than 30 are opposed. Supporters pointed out that it is possible to deal with the problems raised and that membership would not be politically problematic. The opposers tabled reasons such as negative implications for the perceived character of Wikimedia as an educational organization and the questioning of our neutrality. Eight users were undecided, primarily saying that insufficient information is presented to judge the merits of such an initiative appropriately. The WMF asks editors to share their views at RfC's discussion page to ensure wide participation from its communities.
Wikipedia:Wikipedia Signpost/2012-07-02/Serendipity
This piece examines a key question that new Wikimedia projects such as Wikidata are concerned with: how to properly represent knowledge digitally at the most basic level. There is a real danger that an inflexible, proscriptive approach to data will severely limit the scope, capabilities and ultimate utility of the resulting service.
At one level, the textual representation of information and knowledge in books and online can be viewed as simply another serialisation and packaging format for information and knowledge, optimised for human rather than machine consumption. Within the Wikipedia community – Wikidata and elsewhere – there is a perceived utility in using more structured, machine-friendly formats to enable better information sharing and computer-assisted analysis and research. However, there remains a lot of debate about the best approach, to which I will contribute the views I have developed over nearly a decade of research and development projects at the Bodleian Library[1] and before that, through my involvement with knowledge management in the commercial domain.
My first point is that metadata and data are really different aspects of a continuum. In the majority of cases, data acquires much of its meaning only in connection with its context, which is largely contained within so-called metadata. This is especially true for numerical data streams, but holds even for data in the form of text and images: when and where a text was written are often critical elements in understanding the meaning.[2] Data and metadata should be considered not as distinct entities but as complementary facets of a greater whole.Secondly, there will be no single unifying metadata "standard" (or even a few such standards), so deal with it! For example, biosharing.org lists just under 200 metadata standards for experimental biosciences alone. The notion of a single standard that led to the development of MARC, and latterly RDA, in the library sphere is simply not applicable to the way in which metadata is now used within the field of academic enquiry. This means that any solution to handling digital objects must have a mechanism for handling a multiplicity of standards, and ideally within an individual object – for example, bibliographic, rights and preservation metadata may quite reasonably be encoded using different standards.[3] The corollary of this is that if we have such a mechanism there is no need to abandon existing standards prematurely. This avoidance of over-proscribing and premature decision-making will be familiar to Agile developers. Consequently, Wikidata developers would be ill-advised to aim for a rigid, unitary metadata model – even at a basic level, representing knowledge is too complex and variable for such an approach.
So how do we balance this proliferation of standards with the desire for sharing and interoperability? We can find several key areas in which a consensus view is emerging, not through explicit standard-setting activities but through experience and necessity. This gives us a good indication that these are sensible points on which to base longer-term interoperability.These common properties are obviously very amenable to storage and manipulation in a relational database. Indeed, for large-scale data ingestion with the following clean-up, de-duplication and merging of records/objects, this is likely to be the best tool for the job. However, once this task has been completed and we delve into the more varied elements of the objects, the advantages of a purely relational database approach are less clear-cut.
Instead, we can treat each object as an independent, web-addressable entity – which in practice is desirable in its own right as a mode of publication and dissemination. In particular, we can use search engines to index across heterogeneous fields – Apache Solr excels at faceting and grouping, while ElasticSearch can index arbitrary XML without schemas (i.e. all of the varied domain-specific metadata). These tools give users ways into the material that are much easier to use and more intuitive.
The objects alone are only a part of the picture – the relationships between objects are critical to the structure of the overall collection. In fact, in many cases (especially in the humanities) a significant proportion of research activity actually involves discovering, analysing and documenting such relationships. The Semantic Web or, more precisely, the ideas behind the Resource Description Framework (RDF) and linked data, provide a mechanism for expressing these relationships in a way that is structured, through the use of defined vocabularies, but also flexible and extensible, through the ability to use multiple vocabularies. While theoretically it is possible to express all metadata in RDF, this is not practical for performance[5] and usability[6] reasons, and is unnecessary.
This model of linked data, combining a mix of standardised fields and less-structured textual content, should not be entirely unfamiliar to people used to working with Semantic MediaWiki, sharing their metadata on Wikidata, or using data boxes in Wikipedia! However, when applying this model to practical research projects it emerges that a critical element is still lacking. Although we can describe relationships between objects using RDF, we are limited to making assertions of the form <subject><predicate/relationship><object> (the RDF "triple"). In practice, relatively few statements of this form can be considered universally and absolutely true. For example: a person may live at a particular address but only for a certain period of time; the copyright on a book may last for 50 years, but only in a particular country. Essentially, what is needed is a mechanism to define the circumstances under which a relationship can be considered valid. A number of possible mechanisms could do this – replacing RDF triples with "quads" that include a context object; annotation of relationships using OAC.
These examples are really just special cases of a more general requirement that is of great interest to scholars. This is the ability to qualify a relationship or assertion to capture an element of provenance. Specifically, we need to know who made an assertion, when, on the basis of what evidence, and under which circumstances it holds. This may be manifested in several ways:
These qualifications become especially important when we try to use computational tools such as analytics and visualisation. Indeed, projects such as Mapping the Republic of Letters (Stanford University) are expending significant effort to find ways of representing uncertainty and omission in visualisations.
I believe there needs to be a subtle change in the mindset when creating reference resources for scholarly purposes (and, arguably, more generally). Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities. Thus we should aim to accurately represent the state of knowledge about a topic, including omissions, uncertainty and differences of opinion.
Wikipedia:Wikipedia Signpost/2012-07-02/In focus
No cases were closed or opened, leaving the number of open cases at three. One motion was filed this week.
The case concerns alleged misconduct with regards to aggressive responses and harassment by Fæ toward users who question his actions. The case was brought before the committee by MBisanz. The other parties are Michaeldsuarez and Delicious carbuncle. A decision is expected on 6 July.
In response to a workshop proposal calling for the removal of his adminship, Fæ's administrator rights were removed at his request on 18 June; he has declared he will not pursue RfA until June 2013, and that should another user nominate him and he feels confident to run, he will launch a reconfirmation RfA rather than requesting the tools back without community process.
The case was referred to the committee by Timotheus Canens, after TheSoundAndTheFury filed a "voluminous AE request" concerning behavioural issues related to Ohconfucius, Colipon, and Shrigley. The accused deny his claims and decried TheSoundAndTheFury for his alleged "POV-pushing". According to TheSoundAndTheFury, the problem lies not with "these editors' points of view per se "; rather, it is "fundamentally about behaviour". A decision is expected on 8 July.
The case, filed by P.T. Aufrette, concerns wheel-warring on the Perth article after a contentious requested move discussion (initiated by the filer) was closed as successful by JHunterJ. The close was a matter of much contention, with allegations that the move was not supported by consensus. After a series of reverts by Deacon of Pndapetzim, Kwamikagami and Gnangarra, the partiality of JHunterJ's decision was discussed, as was the intensity of Deacon of Pndapetzim's academic interests in the topic. Questions were also raised about the suitability of the new move review forum.
In a workshop proposal, uninvolved user Ncmvocalist outlined in proposed principles the need for administrators to lead by example, behave respectfully and civilly in their interactions with other users, learn from experience, and avoid wheel-warring irrespective of the circumstances or nature of the dispute; and that WikiProjects are not platforms for point-of-view pushing or the pushing of one's own agenda and where consensus cannot be reached other venues of discussion should be sought out. Proposed decisions are due on 12 July.
A motion was filed by arbitrator PhilKnight calling for the removal of Carnildo's administrative tools for "long-term poor judgement" in his use of the tools. Carnildo may regain the tools via a successful request for adminship. At the time of writing, seven arbitrators are in unanimous support of the motion, a majority of 8 is needed for the motion to pass. Wikipedia:Wikipedia Signpost/2012-07-02/Humour