The Signpost

Opinion essay

The monster under the rug

Sven Manguard has been editing Wikipedia for just over a year. He works primarily in the File namespace, but also participates in backlog eliminations and other gnomish tasks. Below, Sven makes a personal plea to the community, asking editors to become more involved in eliminating backlogs. The author would like to thank editors ThatPeskyCommoner, Ironholds, and Fox for offering their support and advice in the creation of this essay.

The views expressed are those of the author only. Responses and critical commentary are invited in the comments section. The Signpost welcomes proposals for op-eds. If you have one in mind, please leave a message at the opinion desk.


The task of encyclopaedia cultivation generates vast amounts of paperwork that, if left unaddressed by volunteers, accumulate into enormous backlogs.

Whatever people may say about declining participation, Wikipedia still generates a lot of new content. We add articles and upload dozens upon dozens of files every day, and that is unquestionably a good thing. However, as a community, we tend to neglect a large variety of problems that have cropped up in older articles. We sweep them under the rug, so to speak, and that is unquestionably a very bad thing.

The fact of the matter is that Wikipedia has swept so many problems under the rug that we now have a monster on our hands. We have backlogs that are in the hundreds, in the thousands, and in a few cases, in the hundreds of thousands, that have sat relatively untackled for months or years. These aren't petty issues either. There are 250,000 articles that need references. By that, I don’t mean that they need more references, I mean that there are, at last count, a quarter million articles that do not have a single citation to support them, and those are just the articles that are tagged as such. Some of these completely unreferenced articles were tagged as far back as October 2006, a half decade ago. There are an additional 250,000 articles that need additional references, and over 200,000 with unsourced statements. Less absurdly high in count but just as important, there are almost 10,000 articles tagged as containing original research, over 8,500 with disputed neutrality, and over 5,500 with disputed accuracy. I am cherry picking especially important issues with especially high numbers, yes, but there are about two dozen other content related backlogs with over a thousand items in them — listed at the Wikipedia Contribution Team’s backlog dashboard — that are not listed here.

What am I trying to say by listing all of these massive backlogs? I am saying that we, as a community, are failing our readers. People come to Wikipedia, for the most part, expecting accurate, neutral, well written articles. In almost a million cases, we cannot with a straight face vouch for the accuracy of the articles we're presenting. It is depressing, it is unacceptable, and unless the community, or significant portions of it, works to tackle these backlogs, the problem will only get worse.

There are a number of factors to blame for this problem. There was a time when ignorance of the problem was a valid claim, but considering the amount of times that one backlog or another has been mentioned in a prominent location, I no longer believe ignorance is a passable excuse. Instead, I believe it comes down to our culture. Working in backlogs certainly isn't glamorous, but more importantly, I don't perceive it as being looked upon by the community as being especially commendable or even as being especially valuable. It seems rather rare that a candidate for RfA puts forth their nomination by leading off their credentials with something like "I have spent the last six months clearing out the backlog at Category:Articles that need to differentiate between fact and fiction" (a category with over 3,500 items, by the way). Even worse, I can point to a few cases where someone did put forth backlog work as a credential, only to have it implicitly or explicitly disregarded by people who only seemed to focus on whether the nominee had written "enough" articles or had "enough" good and featured articles. Simply put, until the community decides that working on backlogs is a valuable activity, and shows it not only at RfA, but also in discussions and everyday community interaction, not enough people are going to jump in and start working on clearing backlogs.

This is not to say that no one values backlog work. There are a few groups of editors dedicated to working on clearing out particularly important backlogs. The Guild of Copy Editors and WikiProject Wikify deserve a tremendous amount of respect in particular for keeping the backlogs at Category:Wikipedia articles needing copy edit and Category:Articles that need to be wikified low; by doing so they ensure a great many articles are a great deal more readable than they otherwise would have been. In the area of files, which happens to be where I spend a majority of my time, backlogs are kept low by a combination of exceedingly useful bots, a few organized drives (such as WikiProject Images and Media's recently concluded Move to Commons drive), and a small handful of editors who devote large amounts of time to working with files.

It is, of course, not enough. This brings me to the primary motivation behind my decision to write this opinion piece:

I am asking, no, begging, everyone that reads this piece to go to this page, select a backlog that they think they can help out with, and knock off a few items. Spend an hour on it, devote ten minutes to backlogs once or twice a week, or do whatever else works for you. It doesn't have to take up a lot of time. If you want, show me a few diffs and I'll give you a barnstar; I'd be happy to. If 1,000 people read this, and each of them clears ten items this month, that’s 10,000 items. If everyone does ten items a month for an entire year, 120,000 items will have been cleared. Even distributed among two dozen or more backlogs, that is a formidable number.

I wouldn't go as far as to beg random strangers to do this if I weren't absolutely convinced that this was of vital importance, but here I am begging for all to see. I also wouldn't ask this of the community if I didn't think it were possible to make a noticeable difference. Recently I cleared a 1,500 item backlog in just a month, with the assistance of one other editor. The two of us, in weeks, took out a backlog that had sat untouched for years, and that specific backlog will never come back. While we'll never be able to eliminate maintenance tasks, it is possible to eliminate the massive backlogs that we have now, and return the number of pending cleanup tasks to a reasonable, functional, level. All it takes is work — and editors willing to do that work. Please join me in the coming months. Together we can defeat the monster under the rug.

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Hello there everyone. This essay was actually ready for print last week, however another editor had a piece waiting, which was in front of this in line. I didn't know that it would happen, but a few days after I wrote this essay, the Signpost put out a call for writers, which I responed to by volunteering to write the Discussion report, as well as coordinating the Opinion Desk. It was a complete accident that the first piece being published under my tenure is one that I wrote myself. While it certainly was enjoyable to write this, I won't be writing any more of them for a while, I only have so many interesting opinions, and I've got to save them up.

That means it's time for more shameless begging! If you have a Wikipedia related opinion, are capable of composing it coherently and in your own words, and are willing to share it with everyone else, I want to hear from you. Really, those are pretty much the only requirements that the Signpost has for opinion essays. It's not like they're flowing in so fast that we have to pick and choose; if you bring in a quality submission, the chances that it'll get run are exceedingly high.

I'm also not above going out and finding people who have already written essays on Wikipedia and asking them to run those essays on the Signpost. If I do ask you, please consider it an honor (and say yes).

The opinion section, more than any other area of the Signpost, can't work without members of the community becoming involved. I hope to hear from some of you soon,

Sven Manguard Wha? 09:01, 29 October 2011 (UTC)[reply]

This discussion hasn't really made it to the wikis yet, but at the Foundation, some of us are trying to start a movement towards documenting and re-examining how Wikipedia handles workflows. Firstly, the Foundation employees are hired for skill and availability, so they are rarely wiki-insiders, and often unaware of how complicated some of the processes are. Secondly, once you document these workflows, certain weaknesses in them become apparent.

It seems that this pattern comes up over and over again; where things are seriously broken, it's because there's no system to channel resources appropriately. So you need to make appeals for heroic behavior. This is unsustainable.

The wiki model is great when, in one person, you can combine a lot of roles: noticing a problem, doing research, scheduling a time to do the work, and the requisite technical skill. All that comes into play, even if you're just fixing a typo. But when the problems are larger and more difficult, it starts to make sense for there to be different roles and stages to the work, and maybe even different incentives.

A site of our size should not be frightened of a queue of work that is several thousand items long. We just have to figure out how to activate our readers' interest. What do you think?

NeilK (talk) 16:53, 1 November 2011 (UTC)[reply]

I was made aware of, right after this went live, a German Wikipedia effort to curb backlogs. It's called Wartungsbausteinwettbewerb, and at this point all I know about it is that it's a massive competition that occurs several times a year and is devoted solely to backlog clearing. Wikipedians, or at least many of them, are competitive. We should harness that competitiveness with a similar competition. Sven Manguard Wha? 17:02, 1 November 2011 (UTC)[reply]

I think what it all comes down to is that we are all volunteers here, so we want to be spending our time working on things that will be noticed (like articles). The backgrounds stuff, like backlogs, are things that are rarely noticed and the only obvious benefit is a larger edit count (which does inspire some people to do it).

If you ever want the backlogs to be something that is heavily focused on, you need to have some sort of system that recognizes the work that people do in them. The more "praise-worthy" it is made in terms of rewards, the more people that will want to do it. And, hopefully, the higher in regard it will be held in places like RfA, though we all know RfA is thoroughly broken as it is anyways, so I kinda doubt that one. SilverserenC 17:37, 1 November 2011 (UTC)[reply]

All the more reason to give Wartungsbausteinwettbewerb a shot, yes? Sven Manguard Wha? 18:21, 1 November 2011 (UTC)[reply]
I've worked on backlogs before. I've cleared out the backlog SandyGeorgia links to below several times. I just have a project and a test this week, so i'm a little busy. Ask me this weekend, I should be free then. SilverserenC 18:33, 1 November 2011 (UTC)[reply]
OK, I tried, in spite of my limited time, and I gave up because trying to clean out one measly category correctly could take me an entire week: my conclusion is reinforcement of the concern that as Wikipedia has become more known and has attempted to recruit to account for declining editorship, incompetent editors are replacing competent editors in droves. The notion that knowledgeable editors can address these problems is naive (just imagine trying to remove the POV from Chavez-- I've been working on that for six years).

So, I took what looked like a relatively easy category, less than 50 pages, that I thought I should have been able to empty Category:Pages with missing references list-- what I found was massive problems in every article I looked at, such that simply adding a reflist to the article would be irresponsible (and why can't a bot do that, anyway?). Perhaps I should lower my standards and just add the darn reflist parameter, but I'm not going to do that and then have someone come along and say, "look at that, she added the tag and didn't even notice the article was a copyvio". I found most typically Indian editors adding most likely copyvios and indecipherable text, I found text so utterly indecipherable that it would take me hours of research to figure out how to fix any one article, I found incorrect article names, dubious notability, you name it.

The problem is not that estblished editors are failing-- the problem is that there is simply too much crud coming in from incompetent editors for established editors to have a prayer of keeping up with the routine maintenance. In almost every case, I came to the conclusion that no text would be better than the bad text there: I don't know why we're actively recruiting editors who don't display competence or commitment to Wikipedia via university projects.

It's a nice editorial, but we can't get there from here. SandyGeorgia (Talk) 17:45, 1 November 2011 (UTC)[reply]

I have long advocated that Wikipedia needs to shed a few thousand articles, however while my view has little support. Meanwhile, users are able to mass create stubs with bots. I appreciate the ideal that we should strive to cover everything, but I also believe that we should prioritize on doing it right, rather than having something on everything now. All of this, however, is slightly off topic. We have the problems sitting around now, we really should try to fix them. For the sake of discussion, however, if articles really are beyond saving, deleting articles tagged with issues does lower the backlogs on those issues. Sven Manguard Wha? 18:13, 1 November 2011 (UTC)[reply]
I know of a few thousand articles related to motorcycling topics that have been taggeed for years as being unsourced or original research and whatnot. But I also know that these articles, with only a handful of exceptions, get next to no traffic. It makes plenty of sense if you think about it: the reason the monster gets ignored is that nobody ever sees the monster. Because there are a million or so articles on Wikipedia which are so obscure that their maintenance tags are being seen by next to nobody, so next to nobody reacts to the maintenance tags.

There is a backlog of articles that get a significant amount of traffic which need to be cleaned up, but that backlog is a couple orders of magnitude smaller. So I disagree that Wikipedia is really failing its readers here -- in 99% of these cases there are no readers to fail. As a suggestion, I would want this backlog list sorted by traffic, so that the articles with the most readers are fixed first, and the ones with no readers are fixed last, or never. It would be a terrible waste of limited resources to work on the majority of these articles. --Dennis Bratland (talk) 18:07, 1 November 2011 (UTC)[reply]

I think that sorting items in a backlog by hits is a good idea. At the very least, it allows us to save face easily. It does not, however, solve the problem of there not being enough people working to clean up the backlogs. As to the notion that we're not failing our readers because the articles that are tagged are not viewed, you're viewing them, so clearly there are people, however low in number, that do care about those articles. If we're neglecting small numbers of users simply because the articles that they are interested in aren't viewed very often, it dosen't mean that we're not failing our readers, it means we're failing a smaller number of our readers, and that's still unacceptable. Sven Manguard Wha? 18:20, 1 November 2011 (UTC)[reply]
I'm sorry, but calling an article tagged for maintenance "unacceptable" is histrionic. Maintenance tags are a normal, healthy part of how Wikipedia works, and how Wikipedia gets better in an orderly fashion. The tags give readers adequate warning. If you want to say that about the 167 unreferenced BLPs, then maybe you have a point. You might even have a point if you were talking about unreferenced articles with no maintenance tags to warn readers. But it is not "unacceptable" that 1984–1985 United States network television schedule (Saturday morning)'s ~200 readers per month have no assurance of the accuracy of the article. Perhaps the statement that "From Jan 12-Feb, The Snorks and Pink Panther and Sons swapped time slots" is utter fabrication, but it doesn't bother me in the least. Not in the least. I have much, much better things to do.

Now the only reason I was viewing them is totally meta -- I was poking around at random on Wikipedia:WikiProject Motorcycling/Cleanup listing for important articles needing cleanup. But that was before Wikipedia:WikiProject Motorcycling/Popular pages became available; now I focus my efforts on what matters and lose no sleep at all over the bad pages with no readers.

If these neglected, unread pages bother you personally, that's fine. But you need a better argument if you're telling me to quit working on a page with thousands of hits per month because some page with 200 hits a month is unsourced. If anything, it would be beneficial to discourage editors from wasting time on pages like that.

But we do agree that the highest traffic pages should be fixed first, so by all means, push ahead on that front. --Dennis Bratland (talk) 18:40, 1 November 2011 (UTC)[reply]

You're right, maintenance tags are not, in and of themselves, problems. Having them in the tens and hundreds of thousands, however, is a problem. Having tags from a half decade ago is a problem. Maintenance tags work on the basic operating assumption that at some time in the near to intermediate future, they're going to get maintained. If that's not happening, we have a systemic failure. If tags are just going to pile up and never get fixed, the philosophy that "things don't have to be perfect now because there is no due date to have them done by" falls apart. That philosophy is part of Wikipedia's core. Sven Manguard Wha? 18:51, 1 November 2011 (UTC)[reply]
Your argument is ad nauseam: you keep repeating what a problem it is that X number of articles have old tags. X happens to be 250,000 but so what? Why is that a problem? 250,000 out of 3.8 million articles is 6.6%. Why is 6.6% such a problem? What rational reason is there that this number is too high? Should it be 3%? Or 2%? Why? And if only a fraction of that 250,000 get much traffic, why indeed, is it a problem? You could recruit an editor to spend ten hours fixing 20 of these articles, and those 20 could only get a total of 4,000 hits a month. Whereas that same editor could be recruited to improve a popular article that gets 4,000+ hits in a single day, or even a single hour. The difference in impact is staggering, and to thoughtlessly suggest devoting time to such obscure articles when many, many times greater benefit can be had elsewhere is illogical. I am asking for evidence of why this work is more important than other work. If all the articles with 10 or 100 or 10000 times more traffic were flawless, then it might make sense to recruit editors to work the obscure ones. But you seem to think the mere fact that these poor articles exist is inherently a problem. If they are orphans and unread, then they are harming no one, or next to no one. Whereas every {{Citation needed}}-tagged fact in Mick Jagger is going to be potentially misinforming readers at a clip of nearly 500 per hour.

So what if they have been tagged for ten years? Many of them would take ten years to get as many hits as an important article gets in a week.

If you know how to find editors who share your peeve here, that these trees that fall in the forest that nobody hears are a "problem", and if (for some odd reason) these editors would not otherwise contribute, great. At least they're contributing something. But if you want to pull them away from fixing high traffic, or even moderate traffic, articles to fix these pages because they simply bug you, then that harms Wikipedia. --Dennis Bratland (talk) 03:22, 2 November 2011 (UTC)[reply]

I'll note that the number of articles that are unsourced, undersourced, or have unsourced statements, as tallied at WP:GBD is just under 800,000, and that the number of content related tags on that page is around 1,407,000. Even accounting for articles that have multiple tags, that's well over 25% of our articles. It's clear, however, that we have fundamentally different views of the matter, and that our philosophies on Wikipedia are in conflict with one another. That's fine, and I'm glad that this opinion essay is able to spark discussion on the matter. I don't think, however, that asking users to clear backlogs is in any way harming the encyclopedia though, it it saddens me that this debate went to that argument so quickly. Sven Manguard Wha? 03:49, 2 November 2011 (UTC)[reply]
Maybe a better sense of proportion would help. --Dennis Bratland (talk) 23:55, 2 November 2011 (UTC)[reply]
Since "the Foundation employees are hired for skill and availability, so they are rarely wiki-insiders, and often unaware of how complicated some of the processes are" (and I've certainly encountered quite a few helpful and savvy Foundation employees), perhaps the Foundation needs to re-examine their HR process and possibly consider hiring a few "wiki-insiders" to provide a better mix of skillsets and life experiences. As it is, the positions I see posted for the Foundation are not generally "jobs" as we old fogies understand the concept, but rather temporary/LTE gigs of a few months duration, which nonetheless require the successful candidate to move to SF. When you select for people who either already live in the Bay Area, or are willing to pull up roots and can afford to move to one of the most costly housing markets in North America for a temp contract, you are not going to get a very representative selection of candidates. --Orange Mike | Talk 19:03, 1 November 2011 (UTC)[reply]
Indeed; for many WMF jobs being a wiki-insider is the skill or should be. No wonder WMF's grasp of wiki-reality often seems tenuous. Work it out - the few lists the article mentions total nearly 800,000 articles. There were only 3349 editors making 100+ edits on en:wp in September 2011, most of whom won't abandon their usual editing for this. The relationship between tagged articles and articles with actual problems is pretty low in both directions - many of the worst articles aren't tagged & many tags are dubious. Johnbod (talk) 00:05, 2 November 2011 (UTC)[reply]
We already have a very large number of wiki-insiders contributing many hours' work for free. Offering to pay a tiny minority of them for that same expertise is unlikely to be the best way to solve any major problems, I think. If we need to hire employees to do jobs that aren't currently done by wikipedians then the hiring mechanism should concentrate on the skills needed for that job, although proficiency in editing wikipedia will often be a useful prerequisite. bobrayner (talk) 14:52, 2 November 2011 (UTC)[reply]
But is it the "same expertise"? Your own talk page says "I rarely touch articles related to my work", & I and many other editors are just the same. You suggest "proficiency" might be a "useful prerequisite" (what would a useless prerequisite be, I wonder), but more experience than that is apparently a bad idea. It's hard to see why. Fortunately some recent hires by WMF suggest they don't share your views, but the senior levels are still dominated by people with backgrounds in areas that were considered comparable to Wikimedia in a broad way, but actually are very different. Johnbod (talk) 16:07, 2 November 2011 (UTC)[reply]

To comment on the Wartungsbausteinwettbewerb, There is already a Wikicup, it ended today. 124 people signed up for 2011, and many more of us are likely to sign up for 2012, I know I will. It's purpose is mainly getting featured material. Maybe we should argue that there should be some lesser reward for getting maintenance tags removed. If it'd get passed, I'd not only take the challenge, but I'd have a chance of making it past round 1! Everybody wins! hewhoamareismyself 23:13, 1 November 2011 (UTC)[reply]

  • The English Wikipedia is in great shape. If you think 250.000 unsourced articles out of 3.8 million are a problem, then you should try and take a look at other language versions of Wikipedia. Sure, it sucks that not all articles are perfect and that a lot of them have been tagged for half a decade, but people should not panic. It is going to be better with time. I can only speak for my self, but I don't like to drop in on a completely random topic of which I know nothing about, and then start writing. I have just decided to work on one article at a time. --Maitch (talk) 23:29, 1 November 2011 (UTC)[reply]
    • Very true, context is key. I went on French Wikipedia and hit random article for 3 minutes before I found a sourced one. But I think we need to be able to fix up our own house before we can help out the other languages. — Preceding unsigned comment added by Hewhoamareismyself (talkcontribs) 01:37, 2 November 2011‎
      • Agreed. Also, the downward comparison is never a good idea. Of course other projects have worse problems in some areas, but is that the bar we're setting for ourselves as a project? Not being the worst? --195.14.206.143 (talk) 01:58, 2 November 2011 (UTC)[reply]
        • The argument shouldn't be "We're really good, so this isn't a problem", it should be "We're really good, in spite of this problem". The first wording is tragically predominant, in everything from Wikipedia to corporate boardrooms to national political discourse, all that it does is hide things by contextualizing them. The statement "The rest of my body is fine, so this failing liver isn't a problem" immediately seems wrong and raises red flags, and yet it uses the same argument, right down to the word structure, as "The rest of Wikipedia is fine, so a quarter million unsourced articles isn't a problem", which seems to be eliciting less alarm, and even less recognition that there is a problem inherent in the statement. We should adopt the thinking "We're really good, in spite of this problem", because rather than discount the problem, that wording acknowledges it as being an issue. This subtle change makes all the difference, because it throws open the door for that problem to be addressed. Sven Manguard Wha? 04:07, 2 November 2011 (UTC)[reply]
          • You're almost overthinking it imo. From a "PR" angle (i.e. getting as many people as possible to help with the cleanup), that may be good and sadly even necessary. But the current situation should really never have happened in the first place. This whole backlog problem almost reminds me of the financial crisis in a way. All these tagged articles are a bit like I-owe-u's. They're a stain. Far from "We're really good, so this isn't a problem", and even from "We're really good, in spite of this problem", I wonder how good we really are for letting this get so much out of hand in the first place. But like I said, your comment makes sense, from the perspective of motivating others to help with the backlogs. --195.14.206.143 (talk) 04:27, 2 November 2011 (UTC)[reply]
            • I am not saying it is not a problem. I would just like to argue that just because an article is unsourced, doesn't automatically make the article bad or untrue. If you take a look in a paper encyclopedia, then you won't find a reference for each written sentence. The very idea that Wikipedia is a work in progress and that anyone can contribute is what made Wikipedia a success. These days most of the "problem" articles are really esoteric topics. Surely we can do better and surely we will. Article creation drops steadily, which frees up time to work on those we already have created. Just don't panic! --Maitch (talk) 11:15, 2 November 2011 (UTC)[reply]
            • In addition, I would like to point out why I choose to go all the way to GA instead of just cleaning a bit up. Let's say I pick an completely unsourced article to work on. If I add 2-4 references, then it will be tagged with "needing additional references". If I add 10-20 references, then it will be tagged with "citation needed" all over the article. I may as well go all the way, so it won't be tagged again. It is a slow process, but you can see the ratio of FA's and GA's going up versus the rest.--Maitch (talk) 12:04, 2 November 2011 (UTC)[reply]
Article creation drops steadily, which frees up time to work on those we already have created -- Isn't that a bit of a nonsequitur though? You're implying that the rate of article improvement is rising at the same time that article creation is dropping. But that is not the case.
Also, nobody is "panicking" and I for one resent that implication as well. We're addressing a veritable problem and thinking of possible solutions.
It's good that you're taking the time and investing the effort to go for GA, but many "articles" don't even resemble an actual article yet (think TV episodes) and many editors are simply not able to or interested in that level of work. Getting some of them to help make those articles meet the bare minimum requirements of our core content policies would be a gigantic step forward. --213.196.210.71 (talk) 13:01, 2 November 2011 (UTC)[reply]
Maybe it is because I have been here since 2005 that I have different perspective on things. In those days most people were working on creating new articles and very few worked on improving the articles themselves. A "brilliant" FA had 20 references and most of Wikipedia was unreferenced. These days, it takes 200+ inline citations to get an article to FA standards and a "bad" article has 30-40 references. Article creation is dropping (just look at the statistics page) and the overall quality of articles are getting better. It only makes sense that people have time now to improve now that we pretty much got a page for everything.
Now back to my point. I don't mind people working through backlogs in order to source articles, but usually if you just pick some random article and add a few sources, then it will likely be tagged for some else (additional references, inline citations, etc.). It is better to pick an article that you are knowledgeable about and fix all the problems with the article. Otherwise, you are just pushing the article over to another backlog. Finally, I don't consider 250.000 unsourced articles a failure. It is a great success and will improve over time - you just have to accept that things don't improve rapidly on Wikipedia.--Maitch (talk) 14:38, 2 November 2011 (UTC)[reply]
Not only are they not improving at all, it's getting worse over time. I've been here since 2006, and I strongly resent your appeal to authority (i.e. greater experience) to silence valid concerns about the project. If you don't feel anything needs to change, then go about your business, but please stop trying to convince people who are at least as knowledgeable as yourself about the inner workings of the project that everything is fine and dandy. Just consider the rhetoric devices you've employed in your comments here: First you tried to paint those who voice any concerns as "panicking", now you're claiming you have a more relaxed perspective due to more experience. Turns out, you have neither. And clearing the backlogs isn't all that's required either. We need substantial reform, maybe even in the form of a board of chief editors. Not "right now" (as in "panicking"), but things shouldn't go on like they have. Back in the day, the proportion of clueful editors was much greater than it is today. They didn't need fast rules, their greater average wisdom enabled them to make clueful judgments about when to ignore a rule and when to follow it. The same is not true for the bulk of editors these days. They need rules, if only to prevent them from making up their own. If your years of experience on Wikipedia have taught you anything, it should be that. --87.79.231.188 (talk) 15:25, 2 November 2011 (UTC)[reply]
Ok, I will stop commenting now. I don't have any authority over anybody and if I gave someone that impression, I apologise. I think I have made some valid and consistent points and I can see this discussion completely derailing. Cheers, --Maitch (talk) 15:48, 2 November 2011 (UTC)[reply]
Yet another attempt of yours at stifling the discussion. It isn't "derailing". You are deliberately attempting to derail it. You did not make any valid or consistent points. There is a problem with the general way we're handling many things on Wikipedia. If you do not see that, you are part of the problem. --87.78.44.235 (talk) 16:30, 2 November 2011 (UTC)[reply]

Working on success

The unreference BLP issue evenually got going after a date was established after which certain things where no longer allowed, it eventually was a massive sucess clearing over 60,000 unrefenced BLPs. I do think this can be extended on other new article content and there would be agreement to work forward on clearing backlogs in this fashion. Regards, SunCreator (talk) 18:53, 1 November 2011 (UTC)[reply]

It's a bit like empying a massive pond. People are not motivated to use a bucket to take water away when they see new water coming to fill it up. But once you stop any new water entering the action of removing water gives positive feedback to those involved. In that way I reackon that some sort of system to quickly delete(via a prod) for articles created after a certain date with no references would prove benificial. Regards, SunCreator (talk) 19:00, 1 November 2011 (UTC)[reply]
True, the size is an intimidating factor, but at the same time, few backlogs fill up at a rate of more than 15 or 20 items a month. If someone were to decide to choose a specific backlog and clear one item each day from it, that backlog would see a net decrease over time. Sven Manguard Wha? 04:16, 2 November 2011 (UTC)[reply]
But that is not motivational. Why start clearing when someone can come along and create more? That process never worked on BLP's until a WP:BLPPROD process was put in place. If you are going to ignore motivation then don't expect other to join in backlog clearing activity. Regards, SunCreator (talk) 23:43, 3 November 2011 (UTC)[reply]

Citation needed

Editors creating, especially mass-creating completely unreferenced placeholder "articles" in mainspace should simply be banned in droves. That would be one very efficient way to create an incentive for people to write properly, or not at all. --195.14.206.143 (talk) 00:30, 2 November 2011 (UTC)[reply]

There's nothing wrong with creating articles on subjects the Wikipedia community finds automatically notable (such as animal species, places, landmarks, ect.). It would be beneficial if they had references in the first place, but it's the existence of them that's more beneficial, since they can then be worked on by other editors. SilverserenC 02:26, 2 November 2011 (UTC)[reply]
In an ideal world, it might work like that. In reality, that's just wishful thinking. In practice it works more along the lines of the broken windows principle. Bad substubs attract careless additions and even discourage more skilled writers to waste time sparring with subpar editors who are mostly unfamiliar with policies and the MoS, and therefore many of these articles stay bad forever. Articles should be prepared in user subpages and moved only to mainspace when they meet the minimum requirements of our core content policies. --195.14.206.143 (talk) 02:56, 2 November 2011 (UTC)[reply]
I agree with the Anon. We need to require refs from the beginning. No reflist with at least one formatted ref ---> auto-reject new article. Here's a controversial proposal: No new lists until all articles are referenced. -- Ssilvers (talk) 04:10, 2 November 2011 (UTC)[reply]
I'm not sure what one has to do with the other, Ssilvers. My controversial proposal has for a long time been 'delete all unrefereced articles immediately'. I get lots of moral support for that one, but the prospect of actually getting community consensus for it is slim. Sven Manguard Wha? 04:19, 2 November 2011 (UTC)[reply]
Oh, the two things have a lot to do with each other. Ssilvers does raise an important point, an idea I've also had for some time now. The culture of senseless page creation invites the exact behaviors and patterns that lead to the current situation. A moratorium on the creation of new articles (i.e. creation of new pages in mainspace) for at least a week at a time every now and then would actually be a great way to wake people up to the problem. At 6,928,458 articles, Wikipedia could survive for a week without new mainspace pages being created. Also, I maintain that sanctions of some form are needed against the people who create such a mess in the first place. Not only do they leave a mess, they make Wikipedia as a whole less attractive not just for the readers, but especially for more skilled writers. --195.14.206.143 (talk) 04:36, 2 November 2011 (UTC)[reply]

I am an ordinary untrained Wikipedia user and editor. Occasionally I have been persuaded to become involved in a more systematic way - and it has never come to anything. This issue of backlogs is a good example. I expect my experience is fairly typical (whereas my admitting it is unusual and foolish). Here goes:

I decide "ok, let's see if I can help here". The first link in the essay is 250,000 articles that need additional references. Right, that sounds promising, maybe I can help. I follow the link, and find a further link Category:Unreferenced Genetics articles‎, a field I may be able to help with. Then a further link Talk:Dominance (genetics) - I understand dominance, I ought to be able to help. I find myself at the talk page, which does not obviously complain about lack of references, though I do not read every sentence there. Instead, I visit the article itself, and go to the reference list. It has plenty of references. WTF? I came here to deal with a lack of references, and it has plenty. Ah well, I have more constructive uses for my time. Maproom (talk) 13:21, 2 November 2011 (UTC)[reply]

Well it doesn't have that many refs, & their format & style are certainly sub-standard, and the links need updating. Many are bare links, & others not inline, just sitting at the bottom. I very often just remove tags when I think they are unjustified, but not here. Johnbod (talk) 13:40, 2 November 2011 (UTC)[reply]
To the extent that the issue is only a matter of format and style it should not justify a tag. It's just a matter of re-organizing the material. The person complaining that it does not conform with his personal vision of format and style can go ahead and fix it himself. Eclecticology (talk) 22:50, 4 November 2011 (UTC)[reply]

I used to do occasional work with the 'articles needing help' - then the system was changed to produced the looping around set-up described above ... and I am sure there are far more than 25 history pages requiring cleanup. As for references: not all articles need them directly - for example those developing particular aspects of a main topic - thus Pal Maleter and the Hungarian Revolution of 1956.

  1. Possibly# a reason for some low-activity areas is that those interested in the topic have moved to area-specific wikis elsewhere in the wiki(a)verse, and those of us with only a passing interest in motorbikes (to borrow the above example), only want to know 'X is a big motorbike and Y is a weedy one' (so don't require references).

A suggestion I have made in the past - with the 'left hand column links' have an entry 'Random page needing clean-up' (which picks up on articles with any of the relevant tags). (I know the answer will probably be along the lines of 'involves far more work than seems obvious to the person suggesting it.') Jackiespeel (talk) 15:39, 2 November 2011 (UTC)[reply]

Uncited medical statements should be treated like BLPs

My controversial proposal has for a long time been 'delete all unrefereced articles immediately'. I get lots of moral support for that one, but the prospect of actually getting community consensus for it is slim. OK, so how about this for a lesser starting place? Uncited BLPs were treated differently than other uncited articles because of their potential for harm to living persons. Let's do the same for medical statements. Uncited medical statements can cause as much harm as uncited BLPs-- let's shoot uncited medical articles on site, and empower editors to delete uncited medical statements anywhere as easily as they can uncited BLPs. At least it's a start; in the medical realm, no info is better than bad or dangerous info, and citing medical statements correctly requires knowledge of and access to higher quality sources. SandyGeorgia (Talk) 13:58, 2 November 2011 (UTC)[reply]

Except that we don't immediately delete old BLPs. We have BLPPROD for new BLPs that are made, but that's not even instant, and you can't apply the prod to things made before it was made. Maybe we should start a new article drive, like was done for unreferenced BLPs, get a watchlist notice going and everything. It worked really well before, we got rid of all of them. SilverserenC 14:17, 2 November 2011 (UTC)[reply]
OK, that-- but we are empowered to immediately delete BLP text that is uncited-- we should be able to do same on medical statements-- they can cause as much or more harm as faulty BLP statements. SandyGeorgia (Talk) 15:41, 2 November 2011 (UTC)[reply]
I think this is a really good idea (subject to Silver's comment). Just as with BLPs, we have a wide range of medical content - some whole articles, some mentioned as part of a nonmedical article (ie. the article on some obscure herb may mention its use as a folk-remedy) - of varying quality. Some are good or outstanding, but many others contain false or improbable claims which have genuine potential to cause real-world harm (typically by distracting readers from evidence-based medicine). Wikipedia, and the public, could really benefit from a firmer response to unsourced medical claims; deleting unsourced medical claims on the spot is the obvious answer but maybe there's potential to tweak one or two other processes... something like BLPPROD could be applied for any article which is primarily about a medical treatment. It should be possible to make it retroactive if the community wants that.
Wikipedia really needs to shift from quantity to quality, but it has to be incremental; sadly we can't do it all in one great leap forward. Setting higher standards for BLPs was an important step forward; higher standards for medical claims could be another step forward.
On another front, I dislike the mass-creation of location articles, leaving us with hundreds of thousands of microstubs based only on a single row in an indiscriminate and inaccurate database, and even more that have automatically-added content from censuses &c which is quite unreadable to a normal reader. However, there seems to be strong feeling that any inhabited place is notable by default (personally, I don't think the GNG supports that notion) so we're not going to improve quality of most location articles any time soon. bobrayner (talk) 15:48, 2 November 2011 (UTC)[reply]
I think it is a good idea to treat medical articles the same way we currently treat BLPs. --Maitch (talk) 15:52, 2 November 2011 (UTC)[reply]
Exactly. We shouldn't be trying to say that it all needs to be fixed right now. That's just going to result in the lost of a vast swatch of content. Instead, we should be taking steps to fix and improve such articles. I, personally, actually like the idea of a new article PROD for unreferenced articles, but it's something that has to be done slowly. Starting in medical articles seems like a good idea, for new medical articles. And we can start a drive to fix up all the old medical articles that are still unreferenced. Deal with it all a step at a time, not all at once.
And the place notability thing has to do with the Five Pillars and What Wikipedia Is, where gazetteer is one of the definitions, which means places are notable. That trumps the GNG and all of WP:N, which is just a guideline, not a policy. SilverserenC 16:07, 2 November 2011 (UTC)[reply]
What WP:WIS actually says is that "Wikipedia incorporates elements of...gazetteers," among other things. And gazetteers don't necessarily list "any inhabited place" within the area covered; it's not abnormal for them to only list those of a certain side. So yes, *some* places must be notable, but that doesn't necessarily extend to any inhabited place. Claiming this as a "trump" card is Wikilawyering at its most absurd, and makes you look risible. Choess (talk) 08:08, 3 November 2011 (UTC)[reply]
The community has asserted time and again that what gazetteer means within Wikipedia is that as long as a single reliable source can be found that proves the existence of the place, then it is notable. If you want it to mean otherwise, then you'll need to get Wiki-wide consensus for your viewpoint. SilverserenC 16:42, 3 November 2011 (UTC)[reply]
"Encyclopedia" and "gazetteer" are terms that define what fields a particular work will cover (people and things, places), but not the scope within that field. Our practice with placenames is to consider those documented by a single reliable source to enjoy the presumption of notability under the general notability guideline; the community has extended this presumption to a variety of subjects, in addition to places. Community consensus regarding the burden of proof for notability of any of these subjects could change without affecting or being affected by WP:WIS. I don't think the current consensus on places is likely to change anytime soon, but the status quo on places doesn't enjoy any special protection in policy beyond that afforded to any topical notability guideline. Choess (talk) 23:57, 3 November 2011 (UTC)[reply]
Actually, the clinical medical articles are in remarkably good shape, and I think you will find very few of them without references. I just checked 20 from among the stubs and start class articles, and found only one without sources--an uncontroversial anatomical definition. I did find 2 or 3 which I think needed more sourcing, but none were specifically of direct clinical significance. Considering I was deliberately looking at the lowest levels, I consider this excellent performance. For one thing, they are very easy to reference at least to a basic level; the are excellent authoritative sources at a popular level: merck, NLM, Mayo. For another, there are a number of medically qualified people who actually do watch over these articles, especially the ones that relate to practical applications where people might go to for immediate answers affecting their health. (And there have been professional studies of our quality in this field, which have found it generally excellent.) Most other fields here would do well to imitate this quality. I think raising this topic here is a classic example of Imaginary Problem Designed to Cause Alarm--just as were the unsourced BLPs, which found an infinitesimal number of dangerous articles--if any.
I do see pervasive problems--the worst is our failure to have a consistent plan for revising articles--a very high percentage of them have out of date information and statistics. (This of course was a major failing of print encyclopedias, but we have the technical ability to do better.) And there is a problem with little-seen articles: not primarily error, but promotionalism. DGG ( talk ) 04:18, 4 November 2011 (UTC)[reply]
There are three problems with your sample: 1) small sample; 2) you checked only stub and start; and 3) you checked only clnical medical articles. You would find more problems if you looked at more developed and popular articles, which are subjected to wider editing, and if you went outside of the realm of clinical articles that are typically edited by and watched by our Medicine Project-- I said "medical statements", not just medical articles, and we find incorrect anecdotal unfounded medical statements in all sorts of articles. The most noticeable problems in fact are in the realm of neuropsychiatric disorders. Further, Merck and Mayo are not high-quality sources, and I've encountered instances where they were simply wrong (wrt Tourette syndrome and others), and doing the research to correct these articles from high-quality sources is not easy at all (particularly if one doesn't have journal access-- I wouldn't take my time to source an article to Mayo or Merck since they have known errors). Again, editors should be empowered to delete uncited dubious medical statements anywhere, just as we can with BLPs.

Here's a more typical mess (C-class), with all kinds of citations that are misleading or misused, and uncited statements: Medical cannabis. It would serve our readers better if it were one-third the size, and cleaning out the garbage would make it easier for experienced editors to source it correctly. SandyGeorgia (Talk) 19:06, 4 November 2011 (UTC)[reply]

I don't see anything screamingly terrible in that article, and cutting 2/3 out would make it about 2/3 less useful. It might be reorganized a bit, say, with the summary style review of the individual compounds split out into a different article about cannabinoids, but I certainly don't see the need to throw away big chunks.
In general, the push to "sanitize" BLPs has been Wikipedia at its worst, undemocratic, blustering, full of fear uncertainty and doubt about libel lawsuits, citing only a handful of short-running pranks of little overall importance for its justification. I see absolutely no reason at all to allow that to metastasize any further. Wnt (talk) 03:58, 1 December 2011 (UTC)[reply]

The negativity

I'm seeing a fair number of negative (though realistic) comments above. One backlog that looks relatively easy to tackle is Category:Persondata templates without short description parameter. It is very large, but for most articles, it requires only reading the first paragraph and compressing it to a few words (while also checking the other persondata fields). Chris857 (talk) 02:31, 3 November 2011 (UTC)[reply]

  • What is the purpose of the Persondata template? Is the information that it contains displayed with the article? - I cannot recall ever seeing a "short description" of a person in an article. Is the effort to "fix" the current 724,700 articles worth making? Downsize43 (talk) 04:40, 4 November 2011 (UTC)[reply]
  • Did one - unlikely to get to the rest any time soon. Suggestion: Write a bot to copy the first sentence of the article (or the first x characters thereof) as the "short description, AND to add a new item to the article's talk page explaining what has been done and how to improve it if desired. This should inspire some watchers to consider whether any improvement is warranted. For those not being watched, or left in a "messy" state by their watchers, the result is still a vast improvement over the present situation. Downsize43 (talk) 02:28, 5 November 2011 (UTC)[reply]
    • Agreed. The extraction of persondata information is similar is some ways to that in infoboxes (but not all articles have or should have those). The production of lists is useful, and especially name disambiguation pages. One big effort I'd like to see made is to have a systematic process in place to keep the name lists updated, as it is still distressingly common to find these lists incomplete. Many readers might read these lists and think we don't have an article on a person, when in fact we do, but they are not listed there. The persondata short description could be used to generate such lists. If someone could program a bot to keep hndis 'PAGE NAME (disambiguation)' talk pages updated with a list of all candidate pages not already on the disambiguation page, that would be an immense step forward. Carcharoth (talk) 03:39, 5 November 2011 (UTC)[reply]

An idea: tie it to WikiProjects

Specifically on the {{unreferenced}} backlog, I'm wondering if there would be any merit in having WikiProject-based unreferenced backlog categories. I'm not that interested in going through and finding references for articles on topics I'm not that interested in to start with, plus I'm not very good at it. But on specific topics like philosophy, I've got access to books, databases, libraries etc.

Anyone want to make a bot that goes through the references-based backlogs and adds them to categories based on their WikiProjects? I had a crack at doing it with CatScan, but it took so long that my university network's TTL kicked in and stopped the connection.

Surely, WikiProjects could take some collective responsibility here? Instead of crowing about how many GAs and FAs they've got, crow about how they've got a really small unreferenced backlog. Might work. —Tom Morris (talk) 12:30, 3 November 2011 (UTC)[reply]

My advice is to go to the Books namespace, find a book on a subject that interests you, and then go to that book's talk page. In that talk page will be a report that lists all the articles in that book and many of the problem tags those articles have. The report at Book talk:Philosophy shows a dozen articles have citation needed tags, and many articles have other tags. Sven Manguard Wha? 13:07, 3 November 2011 (UTC)[reply]

Ah, I've found a way of doing it using the article lists tool on Toolserver: here is the page for Philosophy. There's 831 articles there. That's far more manageable for me to start plugging away at than 250,000! Heh. —Tom Morris (talk) 13:33, 3 November 2011 (UTC)[reply]

This is one of the things we did for the uBLP cleanup, if we choose a particular backlog to target then DashBot to notify the wikiprojects is practical and useful and has some effect. Dashbot has a table of 700 or so Wikiprojects which is one of the key ingredients for such a drive. ϢereSpielChequers 15:54, 4 November 2011 (UTC)[reply]

criticism

Only a few weeks ago we finally saw the uBLP backlog cleared after an 18 month focus on it, I think the essay should have acknowledged that a little more strongly. The downside of focussing on some backlogs is that others may sit for longer, if we'd chosen to focus on the oldest tagged articles or the oldest unreferenced articles then those backlogs would have seen progress - instead we as a community had a focus on the uBLP backlog and I suspect some others grew whilst that was resolved. Providing we choose wisely there is some logic in focussing on a particular backlog. But if you do that you need to be robust about accepting backlogs elsewhere - of course whilst we were focussing on the unreferenced BLP backlog others were welcome to work on the other backlogs. But the prioritisation of particular backlogs means the deprioritisation of others, which is why in my view if we choose another backlog to prioritise it needs to be the backlog we anticipate having the highest proportion of troublesome articles.

My favourite two are death anomalies and negative BLP statements as I think they are high risk, and consequently I spend some of my time there. I also spend a lot of time fixing typos because that's what I happen to enjoy, but I'm not proposing that we treat that as a priority. However I'm very much aware that we've made huge progress on the death anomalies in the last year or so, and on at least one quality measure things have greatly improved as typos are much harder to find than when I started out. None of those three involve templates, and in my view part of our problem is that a few years ago we shifted from fixing problems to templating them for hypothetical others to fix. I'm not convinced that the growth of templating was that positive a thing, and if there was one easy way to reduce backlogs I'd suggest it would be to reduce or automate some of the templating and detemplating and try and persuade some of our templaters to improve articles instead.

One aspect of the essay that I would strongly dispute is the effect of backlog cleanup at RFA. I've nominated half a dozen successful candidates in the last year or so, and spoken to many others who were considering running. I'm rarely a defender of the RFA process and normally one of the first to consider it broken, but in its respect for people who've cleared backlogs I think RFA has functioned well. I'm aware of a number of candidates who went to RFA during the uBLP cleanup with experience that included clearing uBLP backlogs for a particular wikiproject. Far from RFA not valuing such backlog work I don't remember any such candidate failing. Now that was a backlog where one had to hone two key skills, reliable sourcing and knowing when you couldn't save an article and had to prod or AFD it instead; I'm not convinced the community would be so supportive of someone who had cleared x hundred articles from our "needs an image" backlog - they'd probably need to demonstrate other skills/achievements as well.. But in my conversations with people who'd just failed an RFA I often encouraged them to pick a wikiproject that interested them and resolve their uBLP backlog; Some who took my advice have run again and are now admins. To me that was for some people the sort of task that separated those who just needed the right sort of experience from those who needed the right aptitude or motivation.

Another issue is that the essay needs the context that standards have risen as has average quality, many articles that were considered perfectly OK when written have now joined the backlogs as we seem inexorably to be drifting from a policy of verifiable to a policy of verified. Raising standards means that backlogs are pretty much guaranteed as each increase in standards means a proportion of our 3.78 million articles immediately drops from meeting the old standard to not meeting the new one. That isn't an argument against further raising standards let alone a call for us to lower our standards back to earlier levels, just a practical point that explains much of the backlog

So whilst I don't agree with everything in the essay I'd like to thank Sven for writing it and for bringing the backlogs issue to the fore. In terms of next steps I would suggest that it is helpful to have a community focus, but lets do it without the negatives that were associated with the uBLP project. If the community is going to focus on a backlog we need the choice of backlog to be a community decision, we need to be to be robust in defending that focus against those who want to shift the focus back onto their own pet priorities, and we need to avoid contentious ultimatums - a carrot based process not a stick one. There was a suggestion earlier on that we focus on high traffic articles, I can see much merit in that, but would counter that if anything those are probably less likely to contain headline grabbers than either death anomalies or negative BLP statements. Wouldn't it be great if by the middle of 2012 we could confidently say that every mafiosi, pornstar or prostitute we have written about is either long dead, clearly a fictional character or reliably sourced. Currently there are loads where the article itself isn't a uBLP but it contains such information. ϢereSpielChequers 17:02, 4 November 2011 (UTC)[reply]

This piece has received a lot of criticism, and much of it is good, thoughtful criticism. While I can't say that I'm exactly thrilled with all the comments that have been made here, I'm glad that there is a robust discussion, I am quite thrilled that so many people, many of whom I've never crossed paths with before, have come out of the woodwork to point out that they perceive Wikipedia as being in a whole lot better shape than I make it out to be.
Now that I think about it, when I first checked WP:GBD some six months ago, there was a substantial (maybe 50,000 item) backlog of unsorced BLPs. It's down to under 200 now. I don't spend very much time at all editing or even reading about living people, it's not an interest of mine, so I actually didn't know that there was a coordinated effort to tackle unsourced BLPs. I agree that it certainly bears mentioning, and I probably would have included it had I known about it when I wrote this.
As for the other points raised, while I acknowledge that improvements are being made and quality is increasing, I am sticking to the basic assertion that the backlogs constitute a major problem. I suppose you could say that I want to hold Wikipedia to a higher standard than many are willing to hold it to, or at least what many believe that it is practical to hold it to. I don't doubt that WereSpielChequers believes what he says about how backlogs are viewed at RfA, but I don't think his view is accurate. Perhaps uBLPs really are valued there, but if that is the case, WereSpielChequers has found the exception, not the norm.
I look forward to reading the continuing discussion taking place on this page. Keep it going! Sven Manguard Wha? 19:51, 4 November 2011 (UTC)[reply]
I'm not sure that anyone has disputed either that backlogs exist or that backlogs are a problem. Where I think we disagree is in whether or not there is overall progress being made, and in what the solutions are, including for example whether the community should prioritise particular backlogs or simply reaffirm that our self organising structure seems to be working. The importance of including typos, unreferenced BLPs, Category:Uncategorized pages and death anomalies in the mix is that these are four of the areas where we have seen major progress in the last year, in particular with unreferenced BLPs where the referencing or deletion of tens of thousands of articles was a massive amount of work. Leaving out the areas where we have made huge progress risks leaving people with the false impression that we are not making overall progress with the backlogs. As for RFA and its attitude to people who work on the uBLP backlog, I'd rather not name particular editors without first checking with them, but off the top of my head I can think of four successful RFAs in the last 12 months where the candidate had taken part in the uBLP cleanup and at least two RFAs which have tanked at least in part because the candidate had a poor record re UBLPs. I don't personally involve myself in copyvio cleanup but that is another area which the community prioritises and I've seen RFAs swan through or tank because of the candidate's record in that area. I can't recall anyone failing RFa because they worked on clearing backlogs, though I'm sure an otherwise underqualified candidate could fail despite having done useful work on a backlog. "As for "holding Wikipedia to a higher standard" no-one is arguing that we lower our standards and for example return to the era where an FA didn't need inline citation or people could get Autopatroller rights despite creating unreferenced BLPs. Much of the backlog exists because we have repeatedly decided to hold ourselves to higher standards, and each time we decide to raise our standards we turn some previously acceptable articles into parts of the backlog. There's also a problem in measuring backlogs by the number of articles with a particular template, aside from the issue that not everything that qualifies for a particular template may actually have been templated, the templaters rarely check back to see whether the template still applies, and many newish editors fix problems and improve articles but leave it to others to decide if their improvements are sufficient to justify removal of the template. ϢereSpielChequers 01:15, 5 November 2011 (UTC)[reply]

Additional discussion

Thanks for this opinion piece. There has been some additional discussion on the wiki-en-l mailing list, best seen by starting here (there may be related discussion in the original thread as well). The points raised there (so far) appear to be:

  • Mis-tagging or over-zealous tagging can occur (this makes the size of backlog misleading)
  • Tags are not always removed by the person fixing the problems (see OTRS point in particular)
  • It has been suggested that readers be invited to remove the tags if they see no problems
  • It was also suggested that those adding the tags should check back at some later point
It seems there is a bit of a disconnect between those who think tagging is something that is a useful way to create workflows for others to deal with problems, and those who think that people should (eventually) fix the problems they find, or at least help manage the backlogs. With that in mind, does anyone know how much tagging is or has been done 'by rote', or even using bots? If the pace of tagging has always exceeded the pace of cleaning up, that would be one reason why the backlogs are so large. It might not just be because the cleaning up is slow, but it might also be because the tagging is much faster. It might also be an idea to do random sampling to see how accurate the tagging is. Certainly, tags that are years old might be outdated if standards have risen/changed in the interim period. So there is a case to be made for tags that are years old being themselves tagged as needing at least checking by those who do tagging, even if those doing such checking don't feel able to fix the problems identified.

About prioritising, it is essential to do the right stuff on a page first. There is no point wikifying, dealing with dead external links, categories, orphan pages, etc, without some attention being paid first to issues such as notability and copyvio. The priority should be to get articles on a solid footing and foundation before dealing with the other stuff. This is also why I'm reluctant to help out with wikignoming activities such as wiki-linking. I'm prepared to do this for articles that have been vetted and passed to a minimum standard, but I don't want to do such work if the article may be deleted as a copyvio or non-notable. So if the backlogs could be sliced and diced in that way (i.e. put everything through the basic checks first, and then pass them to the other queues), it would help immensely. Carcharoth (talk) 03:31, 5 November 2011 (UTC)[reply]

I agree with the first part of what you say but not the second. Re the first part. In my experience a high proportion of tagging is done at newpage patrol, those taggers almost never watch the article to see if the tags cease to be relevant, and because many are adding a full set of tags they often take shortcuts - the biggest and worst being to use {{Unreferenced}} when they mean {{refimprove}} or {{selfpublished}}. It is rarely constructive to tell a newbie that an article with a poor reference is unreferenced, especially if done by templating the article. It is important to explain neutral sourcing and that self published sources don't establish notability, but sadly a lot of taggers don't take the time to use the right template, let alone actually fix the problem. The only bot based tagging that I'm aware of is the now suspended corensearchBot with its wonderful ability to sniff out Copyvio, and a bot that looks for the combination of unreferenced and category living people and upgrades unreferenced to unreferencedBLP. There may also be a bot that finds uncategorised articles, which could explain why so many old articles that show up as uncategorised have in fact been vandalised and then tagged as uncategorised.
As for the second part, I see categorisation of new articles as a key part of the process that sifts out the non-notables. I have no interest in sport and no intention of finding out which Martial Arts contests are notable and which are so notable that their annual finalists are notable. But as a categoriser I don't need to worry about that (at least not for martial arts, modelling, heavy metal and Croatia), I just have to accept that many of my categorisation edits will wind up in my deleted contributions, personally I'm comfortable that for some articles categorisation is a prelude to deletion as it brings the article to the attention of those interested in that category and who are best placed to spot hoaxes and non-notables.
This leads us to some difficulty in picking a new backlog to prioritise in the way uBLP has been the focus for the last 18 months. Potentially contentious statements about living people, potential copyvios, articles with high readership, probably dead people and potential non-notables have all been suggested. Clearly I have a preference for some of those over others, but more broadly than that I have a preference for the community picking a particular backlog and running a collaborative campaign to fix it. The advantage of the community deciding to focus on a particular backlog is that you can then slice it by wikiproject and drop all the wikiprojects "their" relevant part (you also get people project tagging the talkpages of the part of the backlog untagged for any wikiproject, in the UBLP drive we wikiproject tagged at least 20,000 of the uBLP backlog). This combined with the sort of signpost coverage and RFA support that the uBLP drive achieved will bring in extra editing time, especially from those who prefer to be part of a collaborative team. ϢereSpielChequers 12:04, 5 November 2011 (UTC)[reply]
I agree that what you propose is excellent, and I hope someone will get a proposal up and running to select a backlog to work on (I see there is a collaboration for October focused on album articles without cover images in the infobox, which might suggest that some guidance is needed in terms of prioritisation). Though I maintain that copyright is still one of the most pressing problems, especially since CorenSearchBot stopped working (unless there is more news on that). I recently stumbled on George Butterworth (psychologist) and no-one who edited that article seems to have picked up the obvious problems. Carcharoth (talk) 17:37, 5 November 2011 (UTC)[reply]

Overtagging is not a significant problem. Any article with a tag that shouldn't be there is overwhelmingly outnumbered by articles that don't currently have tags but should realistically have several. DreamGuy (talk) 15:13, 5 November 2011 (UTC)[reply]

Mis-tagging is a better term, IMO. Overtagging implies that all the articles that need tagging have been tagged, and as you say, they haven't. But mis-tagging is a problem because it creates inefficiencies. It might give some people a feeling that they are doing something if they clear a 'backlog' by removing or fixing tags applied in error or updating outdated tags, but it makes it very handle to get a true handle on the size of the problems and where to allocate resources. Carcharoth (talk) 17:37, 5 November 2011 (UTC)[reply]
Overtagging is a problem because it bites newbies, adds errors to the pedia and disfigures articles. In the narrow sense of whether we rely on the tagged figure as a measure of the backlog, then yes there are some backlogs such as unreferenced and refimprove where the untagged that would qualify for the tag almost certainly outnumber the mistagged by a wide margin. But we need to remember that measuring the backlog is very much an internal function of interest to a small number of Wikipedians and those who study us, the primary role of these tags is to communicate with our readers and editors, so if any tag in mainspace is incorrect then that is a problem, and if thousands are incorrect then that is a serious problem. Another issue to be aware of is that measuring the tagged backlog can give you a completely false idea of the scale of the backlog and our activity in resolving it. In 2009 there were people concerned that the size of the uBLP backlog was quite stable and jumping to the false conclusion that little work was being done and little or no progress being made. In fact there were thousands of old uBLPs being tagged as uBLPs every month, so the untagged uBLP backlog was shrinking by thousands per month, but the referencers and deleters were barely keeping pace with the taggers. Eventually the trawl through mainspace for old uBLPs seems to have completed and subsequently the backlog fell by thousands per month - somewhat assisted by the new BLPprod process. But we need to remember that the number of records with a particular tag is not necessarily as good indication an indication of the size of the problem as a survey of a random group of articles would be, and we also need to avoid targetting the wrong stage of the process. One of the reasons why it was a mistake to target the tagged uBLP backlog in early 2010 was that the mainspace trawl was still incomplete, so we didn't know the true size of the backlog. Targetting the tagged part of it meant that those who were trawling mainspace for untagged uBLPs were in effect adding to the perceived problem instead of shifting articles from the unknown to known part of the backlog. ϢereSpielChequers 01:00, 7 November 2011 (UTC)[reply]

Does anyone here think there is merit in the idea of modifying the tagging templates to make clearer to readers and editors that they can remove the tags if they think the problems are fixed? It would be something along the lines of HotCat, where a single click would remove the template. Or put a message on the talk page asking someone to look at the article and update the tag. Something like that would harness a lot more eyes to see if the tagging is actually accurate. Carcharoth (talk) 17:41, 5 November 2011 (UTC)[reply]

I certainly support this. The first time someone added a "cite" tag to material I had written, I added a citation, and was surprised that s/he did not then remove the tag. I concluded (wrongly, as I much later realised) that tags were meaningless, and should be ignored. Maproom (talk) 18:06, 5 November 2011 (UTC)[reply]
I like the idea of encouraging people to remove incorrect templates, but I'm not sure what the best way is to do that in the template without encouraging those who think that a link to a wikipedia article or MySpace page is sufficient to remove an unreferenced tag. Perhaps one way would be to try and identify the active editors who are referencing articles and drop them a note along the lines of "hi thanks for your work on ******, I've removed the unreferenced tag, if you are going make similar improvements to other articles please feel free to remove resolved tags yourself". ϢereSpielChequers 01:00, 7 November 2011 (UTC)[reply]

On allegedly failing our readers

There was a comment in the essay something like (not hunting for the exact wording) that we have million articles we cannot guarantee the accuracy of an that we are therefore failing our readers. The other way to look at it is that we have a million articles that we have warned our readers to be suspicious of the accuracy of and on those million we are *not* failing our readers because we gave them an explicit warning about something that is implicit on all of our content. This whole site is made up of articles contributed by people off the street. Just because an article contains references it doesn't mean that it's accurate, free of bias, etc. We are never going to be perfect, and by not being perfect we are not failing our readers. If we were an encyclopedia with paid editors and contributors and charged people to buy it and had untrustworthy articles *then* we would be failing our readers. We don't owe our readers any more than what they can expect from a website that almost anyone at all is allowed to edit without first passing any test of intelligence, writing ability and objectivity.

Wikipedia has competing values: openness and quality, to name the big ones, are almost diametrically opposed. Allowing anyone to edit is great when you have no content and need to bring more people in, but it's bad when you already have content and don't want it to get worse. About the only thing preventing an unstoppable downward spiral is that thankfully the people who are less skilled in writing an article with any value are also less skilled at figuring out *how* to edit articles. With such a massive dump of largely worthless content on some of the lesser travelled pages, and with the site making it extremely difficult to not only delete articles but making them stay deleted, if we have volunteers taking their valuable time going around tagging content as low quality, then they should be praised. Demanding that they then go and clean up all that content, despite the fact that there are no processes to ensure that anything fixed will ever *stay* fixed, is asking quite a bit.

If a parent saw someone else picking up a candy wrapper that their thoughtless child dropped on the ground and then berated that stranger for not coming onto the lawn, picking up all the broken toys, arranging them in the garage, washing the car and planting a flower garden, we would think that mother or father was crazy. Instead of complaining about the backlog maybe we should focus on what causes the problem in the first place and thank the people who do anything at all to improve things when the system is so hostile to taking steps toward ensuring real quality. DreamGuy (talk) 15:06, 5 November 2011 (UTC)[reply]

Citations are so complicated, they baffle the experienced

It's a poor wiki-workman who blames his/her/its tools, I know, but the editing interface is a friggin' nightmare when it comes to references; I've been editing since 2004 and I can't figure them out half the time. No wonder so many articles are poorly referenced - it's like real work. (I suspect this is part of the reason participation has dropped; a newbie excitedly clicks on the "edit" button for an article and stares at a screen of apparent gobbledybook that overwhelms the actual words, then flees in terror.)

I believe this is being worked on, and not a moment too soon. - DavidWBrooks (talk) 14:35, 7 November 2011 (UTC)[reply]

Sort of apologies

Dear folks, I want to say sorry for not putting proper sources for most of the articles I create and edit. It's my Achilles heel, I know it. Be sure that I always try to add relevant, true facts to the articles. In other words, I'm giving healthy food to this huge monster.

But I must explain why I do that. First of all, sometimes I feel it unnecessary to write them for each fact. Do I have to quote each time I write "this racecar driver won this championship"? I try to do moderately long articles (3-6 kB for drivers). Adding sources for each little fact takes even more time, whereas I usually put a couple of links to websites that have the little facts in subpages.

I prefer to spend my time trying to clear long lists like this one over others. It's a different way to contribute to Wikipedia. Is it fine with you? --NaBUru38 (talk) 01:13, 28 November 2011 (UTC)[reply]

I wish we had a way to cite ourselves

I know Wikipedia articles are generally a bad source. But (in keeping with summary style) I think a fair number of these unsourced articles summarize material from other articles they link to. Because the editors have just seen the citations at the articles they link, they don't feel the need to re-cite them in the new spot. This is especially true since when done properly, re-citing all the references can end up leading to a summary that seems overwhelmed by the number of inline citations it has inherited from a longer piece of text. I wish that there were a way that we could highlight/mark a Wikilink to serve also as a reference, only in the limited circumstance where the linked article is being summarized in brief. At least for purposes of taking articles off the backlog list. Wnt (talk) 04:04, 1 December 2011 (UTC)[reply]

Why I think there is this problem

Laziness. In my opinion, what happens is that people think "Why spend the energy fixing [whatever it is] when I can just add a tag ans someone else can fix it.":Jay8g Hi!- I am... -What I do... WASH- BRIDGE- WPWA - MFIC- WPIM 01:23, 26 December 2011 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2011-10-31/Opinion_essay