It’s election season again: voting will soon open for the 13th annual election of the English Wikipedia's Arbitration Committee. In a repeat of last year’s election, there are nine vacant seats on the 15-member committee, eight of which will carry two-year terms and one a one-year term. Following two withdrawals, there are 22 candidates, up on last year’s 18; five of them have already served on the Committee for at least one term: LFaraone, GorillaWarfare, Kirill Lokshin, Thryduulf, and Casliber. Many eyes will be on the number of voters, which last year shrank precipitously by more than a third, from 923 to 593.
As in previous ArbCom elections, the electronic interface SecurePoll will be used, with support–neutral–oppose ternary choice and an unusual S/S+O formula. In the 2013 election, the use of this system made a difference to who was elected compared with the number of supports alone, and last year changed who was given the one-year term.This year has marked one of the most fractious in the history of the Committee, in which judgment voting patterns have at times shown mild evidence of the formation of blocks of arbitrators, depending on the theme. There were several drama-infused cases, including the Gamergate case, which attracted unfavourable outside press coverage; gender-related cases appear to be a point of divergent viewpoints among the arbitrators. The current election will influence whether the Committee can regain cohesion and weather external shocks, including emotionally charged cases and critical coverage by external news outlets.
Excerpts from candidate statements:
Am I suitable to work here as an arbitrator? I have no idea.
Well, this is about the last thing I ever thought I’d do here.
Fuck it, I'll be the first to throw my hat in.
I don't want ArbCom to be regarded as a death panel.
So I was an editor, if not a very hardworking or ambitious one.
We must clean our house, lest those who could advise and assist us dismiss Wikipedia as a nest of boobies.
ArbCom has encrusted itself in mock-judicial trappings.
My opinion of the current committee’s Infamous, Thoughtless, Careless and Reckless handling of Gamergate received some attention.
ArbCom requires more innocent merriment, and I’ll do my level best to supply it.
I consider myself pretty up on cultural differences, having spent extensive time working in ... Canada, US, Australia, and Mongolia.
As well as reading the candidate statements and question pages, community members have written 15 voter guides, almost as numerous as the candidature. Against this, however, the Signpost is providing a different angle by presenting the results of an emailed survey to candidates on both their personal qualities and their views on ArbCom-related issues. This is the methodology we have used twice this year in our coverage of WMF Board and FDC elections. With so many candidates, it is a way to provide voters with comparative data gathered on a large scale in isolation, eliminating the "herd" effect in which candidates' responses are influenced by those of their colleagues. This is at the expense of reducing candidates' views to numbers, so we invited short statements to give respondents the opportunity to state more nuanced views—taken up by only a minority. The survey and the writing of this story was designed and supervised by the Signpost's Editorial Board; editor-in-chief Gamaliel was excluded from the process because he is standing in the election.
There are several findings of interest. The candidates overall think that cases take too long, that case procedures should be streamlined, and that it's too hard for community members to extract the important messages from ArbCom’s judgment pages. They are satisfied with the voting system in the election, and believe the WMF should take more responsibility for minor issues. There is mild consensus that ArbCom's scope to manage excessive behaviour should not be widened, and that off-wiki outing is never acceptable. Consensus is less clear on transparency issues. The candidates are significantly divided on Gamergate and gender treatment.
We received 18 responses; one candidate, Kirill Lokshin, did not respond; Kelapstick apologised that he's "in a jungle" with bad connectivity; two more refused to participate on the grounds that the questions "require candidates to reveal information that they chose to withhold in their nominations or chose not to reply to the users' questions" (Kudpung), and were not "transparent" (NE Ent). Hullaballoo asked to withdraw his responses well after the announced copy-deadline, and after the data analysis had been done, a request we declined. One of the candidates who responded, Samtar, has since withdrawn.
We used a seven-number Likert scale, exploring a six-point response space from 1 to 7:
Each candidate was invited to put a number against each of 20 propositions, and were informed that blanks would be counted as "4" for statistical purposes (these are marked red in the table). We have abbreviated candidates' usernames for reasons of space; those who have served on the Committee are marked with an asterisk. The full wording of the propositions appears at the bottom of the story. Averages and standard deviations appear first;1 then net positives (5–7) and negatives (1–3), which disregard the strengths of the responses to focus merely on which side of neutral candidates lie as a whole.
Key to the abbreviations of the candidates' names in the first row of the table below:
Proposition | Avg. | StDev | Net pos. | Net neg. | OR | TT | M | G | CL* | W | Cal | KG | MB | K | D | GW* | LF* | H | RF | T* | HW | S |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(A) I have a record of minimising drama | 4.9 | 1.6 | 13 | 4 | 7 | 6 | 3 | 5 | 6 | 5 | 6 | 4 | 1 | 6 | 6 | 2 | 6 | 5 | 6 | 5 | 3 | 6 |
(B) I favour strong over light sanctions | 3.4 | 1.3 | 6 | 8 | 2 | 2 | 3 | 6 | 3 | 5 | 5 | 3 | 4 | 4 | 4 | 5 | 2 | 2 | 1 | 4 | 3 | 3 |
(C) I'm prepared to manage difficult editors | – | – | – | – | 6 | 7 | 7 | 7 | 6 | 7 | 6 | 7 | 4 | 6 | 5 | 6 | 7 | 7 | 7 | 7 | 6 | 6 |
(D) Drafting judgments in plain, simple language is my strength | – | – | – | – | 5 | 5 | 6 | 6 | 5 | 6 | 6 | 6 | 7 | 6 | 6 | 6 | 5 | 7 | 7 | 3 | 6 | 7 |
(E) ArbCom's scope to manage excessive behaviour needs to be widened | 3.1 | 1.7 | 5 | 11 | 2 | 2 | 6 | 6 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 1 | 4 | 3 | 1 | 2 | 2 | 2 |
(F) All case evidence should be on-wiki | 3.4 | 2.1 | 7 | 10 | 2 | 6 | 6 | 2 | 2 | 2 | 1 | 1 | 1 | 4 | 3 | 1 | 2 | 5 | 6 | 6 | 6 | 5 |
(G) Arb discussions on cases should be on-wiki | 3.7 | 1.7 | 6 | 9 | 4 | 5 | 5 | 2 | 3 | 3 | 3 | 1 | 1 | 4 | 4 | 2 | 3 | 6 | 7 | 3 | 6 | 5 |
(H) Arb burnout is a major problem to address | 4.9 | 1.3 | 10 | 2 | 6 | 4 | 7 | 7 | 4 | 4 | 6 | 4 | 7 | 3 | 4 | 5 | 6 | 4 | 5 | 5 | 3 | 5 |
(I) Judgment pages: easy enough for community to get the messages | 3.0 | 1.6 | 5 | 13 | 1 | 5 | 5 | 3 | 5 | 3 | 3 | 1 | 1 | 3 | 2 | 3 | 5 | 2 | 2 | 6 | 1 | 3 |
(J) Accept fewer cases, leave more for AN/I etc | 4.3 | 1.6 | 8 | 6 | 5 | 7 | 6 | 5 | 3 | 5 | 2 | 3 | 4 | 4 | 6 | 2 | 2 | 6 | 7 | 4 | 3 | 4 |
(K) Cases need to take less time | 5.7 | 1.2 | 16 | 2 | 7 | 7 | 6 | 5 | 6 | 3 | 5 | 7 | 6 | 6 | 6 | 5 | 6 | 5 | 7 | 6 | 7 | 3 |
(L) Case procedures need streamlining | 5.2 | 1.4 | 13 | 4 | 7 | 6 | 6 | 3 | 6 | 5 | 5 | 7 | 3 | 6 | 6 | 6 | 3 | 5 | 6 | 4 | 7 | 3 |
(M) ArbCom was at its worst in handling Gamergate | 4.2 | 1.6 | 8 | 6 | 4 | 4 | 2 | 6 | 6 | 2 | 5 | 3 | 7 | 5 | 4 | 5 | 2 | 6 | 4 | 2 | 3 | 5 |
(N) I was satisfied with the Gamergate judgment | 3.4 | 1.5 | 5 | 10 | 2 | 4 | 6 | 2 | 3 | 5 | 3 | 3 | 1 | 3 | 5 | 3 | 4 | 2 | 4 | 6 | 1 | 5 |
(O) ArbCom should disregard off-wiki evidence | 3.3 | 1.7 | 10 | 4 | 2 | 6 | 4 | 3 | 2 | 6 | 2 | 1 | 1 | 4 | 2 | 5 | 3 | 6 | 4 | 4 | 1 | 3 |
(P) Off-wiki outing never acceptable | 4.7 | 2.1 | 11 | 5 | 4 | 4 | 7 | 6 | 3 | 7 | 6 | 3 | 1 | 5 | 1 | 7 | 5 | 7 | 6 | 6 | 1 | 6 |
(Q) ArbCom has treated men more sympathetically than women | 4.6 | 1.8 | 8 | 6 | 6 | 4 | 4 | 7 | 3 | 3 | 5 | 7 | 7 | 4 | 4 | 6 | 2 | 7 | 3 | 3 | 6 | 2 |
(R) The WMF should take more responsibility for minors issues | 5.8 | 1.4 | 14 | 2 | 5 | 7 | 7 | 6 | 7 | 6 | 7 | 7 | 4 | 6 | 5 | 7 | 7 | 4 | 3 | 3 | 6 | 7 |
(S) I'm happy with Audit Subcommittee arrangements | 3.1 | 1.5 | 2 | 9 | 2 | 4 | 6 | 4 | 4 | 5 | 1 | 4 | 3 | 2 | 4 | 3 | 1 | 3 | 1 | 1 | 4 | 4 |
(T) I'm satisfied with the ternary voting system for ArbCom elections | 5.4 | 1.5 | 14 | 2 | 6 | 7 | 6 | 5 | 5 | 5 | 7 | 2 | 3 | 6 | 4 | 7 | 6 | 5 | 6 | 7 | 4 | 7 |
Proposition | Avg. | StDev | Net pos. | Net neg. | OR | TT | M | G | CL* | W | Cal | KG | MB | K | D | GW* | LF* | H | RF | T* | HW | S |
Our motivation was mainly to survey attitudes to ArbCom-related issues by the group as a whole (this is a highly relevant cross-section of the community—those who put themselves forward for election). However, before they cast their votes, editors may be interested in scrutinising the responses of individual candidates.
Data interpretation can never be 100% objective, and the Signpost welcomes critical comments and discussion on the talkpage below. Propositions C and D we regard as likely to attract a higher level of public-relations calculation by candidates, which explains the narrow, positive range (who would admit they're unprepared for managing difficult editors or can't write clear judgments?); statistics are less relevant here and are not included. Proposition A might have been in the same class, except that the focus is on evidence (candidates' "record"), with responses ranging from 1 to 7. B is the hanging judge question, which might attract more scrutiny from voters: eight favour light over strong sanctions, four sit on the fence, and six favour strong sanctions (not surprisingly, only one of them going beyond mild agreement).
Let's deal first with the five propositions on which there appears to be clear consensus among the candidates:
On four questions there is only a modest consensus:
Consensus is less clear in two related questions about transparency, especially the second one:
On five questions, the candidates show no consensus:
The two-week voting period will open at midnight on Monday 23 November (although some voters may be confused as to whether this refers to midnight start or end of Monday). Voters are advised that the arithmetic of the ternary system means that opposing all candidates they are not supporting, rather than voting neutral for them, is a more powerful confirmation of their supports. An election feedback page has been established.
1 Standard deviations are a measure of spread. They can be roughly visualised as the space that contains a third of responses above the average and a third below it. If 5.0 were the average and 0.5 were the standard deviation, two-thirds of responses, roughly, fall within the range 4.4–5.5. The larger the standard deviation, the more divergent the candidates' views.
Discuss this story
In the real world, my local newspaper serves a valuable role in summarizing events and local meetings that citizens are unable to personally attend. On wiki, The Signpost has a long tradition of identifying, collating, and summarizing real world activity related to Wikipedia; this is especially helpful since some such content is paywalled. In fact, I quote a 2012 signpost article on the importance of informal dispute mechanisms in my candidate statement. However, as the election RFC has provided a mechanism for the community to ask questions of candidates, it's unclear what "value added" a secretive email survey provides. I personally find the notion of a herd mentality ridiculous; as candidates we all have reasonably long track records on wiki, the electorate would spot pandering to the crowd instantly. (Also, while I'm not really a centuries old treelike creature living in a forest, I am also not a cow or bull.) In any event, when I received the signpost email I knew within 45 seconds I would not be participating, given the ridiculous bias of the question set. For the sake of clarity, I fully support the right of the signpost to ask what they want. NE Ent 23:55, 21 November 2015 (UTC)[reply]
(EC)There's a grand tradition among candidates and politicians of blaming everything on the press. Sometimes they (intentionally ambiguous) are correct. I don't think it does any good though for candidates to blame the press - the press does what they have to do, the candidates should do what they have to do. The press, as seen in this case, tries to nail down the candidates' positions, and sometimes the candidates don't like it.
That said, I don't think that questions like A, B, C, D, H, I, K, and L give us a lot of information. But sometimes, you just have to guess what questions will draw people out and produce disagreement. I much prefer the questions later in the alphabet, the ones on the issues.
Hullabaloo - there's a rule for anybody who deals with the press: never give them anything in writing that you wouldn't want to see in print; or if you do, at least start the private sections with [This section not for quotation] and end with [You can quote me again after this]. If the journalist doesn't understand what info you give them is for publication and what isn't, it's not his or her fault - it's your problem - sometimes a very serious problem. Smallbones(smalltalk) 02:31, 22 November 2015 (UTC)[reply]
"single-item questions pertaining to a construct are not reliable and should not be used in drawing conclusions."[1] NE Ent 03:56, 22 November 2015 (UTC)[reply]
References
I find this big table kind of hard to read, so here's a matrix/heatmap version, sorted by the standard deviation for each question. I left off Hullaballoo (who withdrew from the survey) and Samtar (who withdrew from the race). Opabinia regalis (talk) 04:58, 22 November 2015 (UTC)[reply]
Many thanks to the Signpost for this analysis, and it looks like we're going to be in for a very interesting election cycle. While I've taken the time to read some of the candidate statements (and will be getting to their questions), this analysis really helps like it did for the Board election last May.
That said, I'm disappointed though that of the 22 candidates running, only one is not from the the Anglosphere. That to me is a travesty, and one that I hope the ArbCom will seek to address in future elections. --Sky Harbor (talk) 15:03, 22 November 2015 (UTC)[reply]
In general, though, I think most Wikipedia readers and editors couldn't care less about these elections; I'm sure they are not sure about what ArbCom does, and if they are, I venture to guess that many of them think this is just another popularity contest between insiders. Drmies (talk) 01:43, 23 November 2015 (UTC)[reply]
I should likewise point out though to Ed that we had at one point three ArbCom candidates from India in this election—two decided to pull out because there were other, "better" candidates. I don't think it bides well for the governance of the English Wikipedia, which by virtue of it being the first mover of our movement happens to also be the most international of all of them, if all the members of the ArbCom happen to look at things from only one world view while ignoring everything else. That said, I do think we're spoilt for choice regarding excellent candidates in this cycle, but if we're going to do something about making the project more inclusive and accepting of our movement's own diversity, I certainly would like to think that we can do more, because it seems to me that we're doing nowhere near enough. --Sky Harbor (talk) 02:52, 23 November 2015 (UTC)[reply]
Additionally, the "Anglosphere" and "Global North" have, by virtue of mass migration, incredible variety in their cultures - to consider the cultures of Berlin, Birmingham, Dunedin, Durban, Melbourne, Paris, San Francisco and Saskatoon to be the same would be strange. I believe that there is sufficient variety in the lived experience & worldviews of a small group of persons from those places to provide a sufficiently varied, and balanced, set of views.
We should much prefer an ArbCom with a diverse set of views than a diverse set of identities for its own sake.
I do appreciate aspects of Sky Harbor's initial point; content remains, in places, incredibly US-centric, and, in places, supportive of a limited range of viewpoints. This is in part due to the wider community; in part due to the availability of English-language sources; but I cannot concur that it is because of the demographics of our final dispute resolution body. - Ryk72 'c.s.n.s.' 08:13, 23 November 2015 (UTC)[reply]
For the main point, I am, of course, not opposed to an ethnically, or otherwise, diverse ArbCom, but do suggest that of the qualities we would wish an ideal ArbCom to possess, diverse would fall far behind clear, communicative, consistent, efficient, and (above all) effective.
I am strongly supportive of the representation of more diverse, more global viewpoints in our articles; and while I might personally consider that ArbCom should be judged not by the colour of their skin, or the shape of their genitals, but by the content of their character, I am happy that editors have a view that a diverse ArbCom is an end goal.
I would, however, ask "Why?".
By this, I do not mean to belittle the idea, but to better understand the perceived benefit to the project. - Ryk72 'c.s.n.s.' 10:00, 23 November 2015 (UTC)[reply]
That said, I have a hard time agreeing with Drmies and Ryk72 that being a good Wikipedian is enough, and that their character is enough for them to be given the respect they deserve by the community. We may have some semblance of equality of opportunity (all you need is to write well to catch people's attention, and that's enough), but in reality I'd like to contend that Wikipedians outside our core editing communities (which, in the context of the Wikimania discussion has since been framed to mean the U.S., Canada and Western Europe, plus Australia and New Zealand for our purposes) will have a harder time on the English Wikipedia. They may be editors, but that's it—very few editors from those outer regions actually get a stake in shaping the culture we want to build as a community (the "hegemonic discourse" that 75.108.94.227 points out). The ArbCom's composition may be a symptom rather than a cause, but I hope this serves as a wake-up call for English Wikipedians to recognize that we exist too. --Sky Harbor (talk) 15:48, 23 November 2015 (UTC)[reply]
From a web source, Latin America accounts for 8.6% of the world population. So what? Should we require that 8.6% of the candidates belong to a "Latin America" category, i.e. 1.81 of them? Or should we require that 8.6% of the Arbcom body belong to the category, i.e. 1.29 of them ? Or should we count the heads and see how many registered users belong to the category? And then? Should we mimic the ratio among users, or apply an affirmative action factor? And perhaps augment the size of the elected body to minimize rounding problems? And deal with all these horrible questions about where is the boundary of the category? As soon as you think that people will act according to the categories you think they belongs to, you are not describing a cooperative process that can be ruled by arbitration. In other words, we should either abstain from categorizing or replace Arbcom by some Govcom (first choice being listed first). Pldx1 (talk) 16:47, 23 November 2015 (UTC)[reply]
Statements
Candidates were also asked to submit a 75-word statement. This was almost as silly as the Lickert Scale questions, for some of which (as @Kevin Gorman: and @Hullaballoo Wolfowitz: have said) a Lickert Scale was clearly inappropriate. However, none of the candidate statements appear to have informed the article. It’s interesting that nearly half of the pull quote of candidate statements was drawn from one candidate -- me -- who stands no chance of election but who also takes some care when writing these thing, despite the absurd restrictions. My 75 words were as follows:
Editing is seldom held in high regard, especially here at Wikipedia where everyone is an editor. Yet it's pretty clear that when an old, experienced and talented editor steps away for a moment to do something like run for ArbCom, the consequences are plain to see.MarkBernstein (talk) 02:34, 22 November 2015 (UTC)[reply]
- For the record, as much as I wish I had nothing to do with it, the GamerGate case was decided in 2015 but the case was primarily handled by the 2014 committee. Beeblebrox (talk) 00:19, 25 November 2015 (UTC)[reply]
The study is hopelessly flawed, to the degree that the results are invalid. Rating scales (not Likert scales, but never mind, no one ever gets that right), in this case a scale of agreement with each of the statements, measure one continuous variable. Therefore the semantics of the scale should be constructed so as to measure a continuum from the greatest degree of agreement to the greatest degree of disagreement. Someone, who clearly knew nothing about research methodology, continuous v. categorical variables or scale construction, planted a term mid-scale that broke the continuum and invalidates the survey. Any survey using an odd-numbered rating scales anchored with "strongly agree" and "strongly disagree" must have "neither agree not disagree" as its mid-point anchor statement in order to maintain the continuum. By planting the opt-out/neutral/I don't know point in the center of the scale (which anyone who's had even the most basic course in statistics or research methods would know not to do), the scale is now rendered as two scales: 1-3 measuring disagreement, and 5-7 measuring agreement. There is no way to compare, the two, so what we have here, in simple terms, is garbage. Or as we say in academia: garbage in/garbage out. (For next time, a separate data point "no response" that is off the scale is the correct way to allow a respondent to opt out.) Worse, the practice of counting blanks as a four, rather than leaving them blank, further skews the results. What a waste of time; I hope no one makes a decision about a candidate based on this tripe. --Drmargi (talk) 21:52, 28 November 2015 (UTC)[reply]