WikiProject Spam, revisited

From the archives

WikiProject Spam, revisited

MER-C was interviewed by Mabeenot for The Signpost's WikiProject report originally published July 18, 2011. We invited them to revisit the report and comment on any changes that have happened since 2011. In 2014 MER-C was given the mop in a unanimous RfA. We will publish MER-C's reactions, followed by the original report. –B

Well, this interview aged quickly. So what has changed? What does spam look like nowadays on Wikipedia?

Firstly, I don't know if linkspam in all its forms has increased or not since them. It is no longer economical for me to spend time pursuing it.

I spend my time dealing with undisclosed paid editing instead. UPE is an imprecise term. A better one is covert advertising – the insertion of advertisements that very closely mimic the format of legitimate encyclopedic articles written by volunteers. It is irrelevant whether disclosure is made per the Terms of Use because there is no indication whatsoever to the casual reader that editors have been paid for in both cases. A reader would need to check all of the page history, the talk page and the user pages of all significant contributors to the article in order to determine whether content is paid for. The disclosure requirement is therefore completely pointless for the casual reader.

The most obvious form of UPE involves the creation of articles that would not otherwise warrant inclusion. Long term contributors may remember when Wikipedia:Conflict of interest was titled Wikipedia:Vanity page. This is exactly the functionality these "articles" serve. Ghostwritten vanity pages are designed explicitly to show up on the first item and the sidebar of a Google search, but are difficult for Wikipedians to find and, if found, to evaluate the notability of their subject. Spam is less about Viagra or Cialis, and more about early-stage startups, businesspeople, motivational speakers, cryptocurrencies and so forth.

There are numerous companies that offer ghostwritten vanity pages for a small amount of money, typically a few hundred dollars. These companies employ freelancers in English speaking Third World countries who have very few opportunities for legitimate employment. In fact, similiar dishonest activities such as running a fake news website or writing for an essay mill turn out to be quite lucrative, in purchasing power parity terms, for the freelancers concerned.^[1]^[2]

The level of abuse is systematic, pervasive, and of increasing sophistication. The worst spammers have taken on characteristics of advanced persistent threats, including the use of compromised computers, VPNs and cloud computing infrastructure to post spam. There are no effective admin tools. Two new page patrollers, who screen newly created articles for notability and other problems, have been blocked for corruptly reviewing spam last week (Meeanaya and Ceethekreator). It is only a matter of time before paid editors systematically infiltrate the admin corps.

Much of the increase in spamming is a consequence of Wikipedia's own success. However, a large portion of the blame lies squarely with the Wikimedia Foundation. The WMF places significant emphasis in materials targeted at donors on crude metrics of content quantity and community size simply because that is what the WMF thinks donors want to hear.^[3] The WMF therefore faces incentives very similar to Facebook and Google. Social media sites tolerate a high level of bots, Russian trolls and spammers because fake accounts pad their key metrics of monthly active users and ad impressions, giving the illusion of growth and making them look good in the eyes of their customers (advertisers) and investors. Similar emphasis is put by the WMF (and Facebook) on outreach efforts in the poor countries that are the source of much of the spam, despite multiple past high-profile failures, again because the WMF thinks donors want to see desperate, impoverished people in sub-Saharan Africa being helped.^[4]^[5] A few extra vanity pages and sockpuppets certainly help the WMF look good in their pitch to donors.

The WMF does not sufficiently care about our admin tools being fit for purpose.^[6] Like Facebook, Youtube and Google before recent scandals, investments in content moderation are seen as purely a cost^[7]^[8] while "initiatives" that provide feel-good anecdotes for donors or increase donor-targeted metrics and hence increase donations are heavily prioritized. The WMF deserves nothing but utter condemnation and scorn for the complete lack of maintenance, let alone investment, in the code underlying the administrator toolset. A seemingly simple task such as adding a checkbox to the delete form that deletes the associated talk page requires nothing less than a fundamental rewrite of the relevant code.

The fight against spam is nothing short of an existential battle against the degeneration of this encyclopedia into a large set of vanity pages about attention-seeking subjects. And we're losing.

^ "Meeting Kosovo's clickbait merchants". BBC News. 10 November 2018. Retrieved 31 May 2019.
^ "The Kenyan ghost writers doing 'lazy' Western students' work". BBC News. 22 October 2019. Retrieved 23 November 2019.
^ "Wikimedia Foundation 2017-18 Annual Report". Wikimedia Foundation. Retrieved 23 November 2019.
^ Wikipedia:India Education Program
^ "Angola's Wikipedia Pirates Are Exposing the Problems With Digital Colonialism". Vice News. 23 March 2016. Retrieved 6 June 2019.
^ Don't take my word for it.
^ "Underpaid and overburdened: the life of a Facebook moderator". The Guardian. 27 May 2017. Retrieved 6 June 2019.
^ "Christchurch shootings: Social media races to stop attack footage". BBC News. 16 March 2019. Retrieved 6 June 2019.

Original WikiProject report – Earn $$$ free pharm4cy WORK FROM HOME replica watches ViAgRa!!!

By Mabeenot, 18 July 2011

This week, we spent some time with WikiProject Spam. The project describes itself as a "voluntary Spam-fighting brigade" which seeks to eliminate the three types of Wikispam: advertisements masquerading as articles, external link spam, and references that serve primarily to promote the author or the work being referenced. WikiProject Spam applies policies regarding what Wikipedia is not and guidelines for external links. The project received some help in February 2007 when the English Wikipedia tagged external links as "NOFOLLOW", preventing search engines from indexing external links and limiting the incentive for many spammers to use Wikipedia as a search engine optimization tool. The project maintains outreach strategies, detailed steps for identifying and removing spam, a variety of search tools, several bots for detecting spam, and a big red button to report spam and spammers. The project was started by Jdavidb in September 2005 and has grown to include 371 members. One of the project's most active members, MER-C, agreed to show us around.

How much time do you typically devote each week to fighting spam?

I find the time commitment required for anti-spam work to be extremely variable. Monitoring the IRC feed isn't particularly taxing; and it isn't too difficult to clean up a few possible copyright problems, edit a few articles or perform non-WP related work or leisure concurrently.

WikiProject Spam is the most active project by edits (including bots) and the second most watched project on Wikipedia. What accounts for this high activity and interest by the Wikipedia community?

This is an illusion. 98% of those edits are from User:COIBot, a spam reporting bot. The remaining 2% are to the project's talk page, which serves as a noticeboard for reporting spam campaigns. A good chunk of the edits to the talk page are from a handful of anti-spam specialists. I can't explain the number of watchers though.

What type of wikispam do you come across most often? Do you use any special tools to detect spam or do you simply remove spam you notice while reading and editing articles?

While reading articles and cleaning out the spam contained within haphazardly works, it doesn't address the cause of the problem. I target the spammers themselves, i.e. identifying domains owned by the spammer and systematically removing spammed links to said domains. To do it properly requires heavy use of tools beyond the usual contribution analysis:

Special:Linksearch and its cross-wiki counterpart
Cross-wiki contributions
User:Versageek and User:Beetstra maintain a database of link additions to all Wikimedia projects. New links are reported to the IRC channel wikipedia-en-spam (don't go there yet, it's not currently working) and others. User:XLinkBot, a spam reversion bot, and User:COIBot use this channel as their source of link additions. Reports are triggered when a small group of users are responsible for a large fraction of link additions to a particular site or can be requested through IRC or User:COIBot/Poke (administrators and trusted users only).
Various external tools, including Whois, reverse DNS lookups, HTML analysis, Google AdSense and Google Analytics databases and a bit of Google-fu.
The Firefox extensions NoScript and RequestPolicy to detect redirects to other domains and protect against the mystery meat nature of spammed sites.
A text editor that has fuzzy find and replace functionality, usually implemented using regular expressions.

I target external link additions, so I encounter vanilla external link spam most frequently. The most annoying and widespread spam campaigns, however, involve multiple spam tactics. That said, I've noticed the following recent spam trends -- note the tendency towards avoiding scrutiny from RC patrollers:

The spreading of spam edits over multiple IP addresses and user accounts; one spam link per IP address/account isn't uncommon.
Spam masquerading as citations. This typically involves the repeated addition of a certain "reference" by a given person, the spammy nature isn't apparent until you look at the big picture.
Replacement of existing links and/or citations
Inline spamming, the insertion of external links into article prose purely for search engine optimization
Misleading edit summaries

Have you had any heated conversations with spammers after removing spam from an article? What are some strategies you've used to resolve these conflicts?

Personal attacks, edit warring and vandalism are surefire ways to expedite blacklisting of the spammer's sites. A couple of months ago, I dealt with a spammer who edit warred to include links to his website. He responded by vandalising my userpage, and so the relevant sites were promptly blacklisted. Apart from a bad faith delisting request, we haven't heard from them since. This is typical; blacklisting is a very effective way of removing spammers from Wikipedia. (Unlike blocks, blacklisting requires money to evade—the spammer needs to purchase new Internet domains.)

Has your experience fighting spam resulted in any humorous stories? Have you heard any amusing excuses and special pleading from spammers trying to defend their edits?

See Wikipedia:Grief for details on the usual routine of spammers.

← Previous "From the archives"

Next "From the archives" →

In this issue

29 November 2019 (all comments)

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

We have identified BLPs as a special class of articles which have unique rules. We might designate other classes, for example anything that is "living" such as currently-sold products, companies less than X years old, etc.. and hold them to a different threshold of notability due to their status as being prone to abuse by impossible to detect PR efforts. First step is look at a known UPE corpus of articles and classify articles into common classes that are unique enough to set them apart. -- GreenC 01:06, 30 November 2019 (UTC)[reply]
- We've raised the sourcing standards for WP:CORP not too long ago. Spammers don't care about notability and will submit their spam anyway. WP:BLP hasn't stopped UPE spam either. Spam pages need to be evaluated at WP:AFD. This is problematic for two reasons - AFD has a finite capacity and it also has been infiltrated by spammers voting to keep their own articles and delete those of their competitors. That said, I agree with your point in part - we should start by adapting and applying the stringent sourcing requirements of WP:CORP to biographies. MER-C 10:38, 30 November 2019 (UTC)[reply]
  - Regarding apply NCORP-equivalent rules to BLP articles has really obvious arguments in favour of it, there are benefits of the field-specific rules as well (now if you can find some more people to support a future effort to rewrite NSPORTS to be less absurdly generous I'll be the first one there). Regarding GreenC's suggestion, it's not that they're impossible to detect - if they were, AfD (or any equivalent) would be pointless. It's that they take experienced editor time to assess and, if need be, fix - and as we are all too aware, that's Wikipedia's ultimate bottleneck resource. One minor change to start with is just reminding both reviewers and patrollers that products are held to the same rules as NCORP. Nosebagbear (talk) 18:35, 1 December 2019 (UTC)[reply]
    - "... AFD has a finite capacity and it also has been infiltrated by spammers voting to keep their own articles and delete those of their competitors..." Exactly. All it takes is for one editor to have a page on watch when an AfD is proposed and he/she will rally responses. Consensus will then appear to support retention. One example of this was Wikipedia:Articles for deletion/Sew Fast Sew Easy (2nd_nomination): A small neighborhood store with puff added by one of the store's principals. It's been closed for years yet still had a loyal base who rallied to its support. Blue Riband► 13:39, 8 December 2019 (UTC)[reply]

Make sure we cover what matters to you – leave a suggestion.

Home

About