The Signpost

Special report

Revision scoring as a service


Wikipedia relies heavily on artificial intelligence (AI) based tools in order to operate at the scale that it does today. The use of AI is most apparent in counter-vandalism tools, like those used to revert nearly all the vandalism on the English Wikipedia: ClueBot NG, Huggle and STiki. These advanced wiki tools use intelligent algorithms to automatically revert vandalism or triage likely damaging edits for human review. It's arguable that these tools saved the Wikipedia community from being overwhelmed by the massive growth period of 2006–2007.

Regretfully, developing and implementing such powerful AI is hard. A tool developer needs to have the expertise in statistical classification, natural language processing, and advanced programming techniques as well as access to hardware to store and process large amounts of data. It's also relatively labor-intensive to maintain these AIs so that they stay up to date with the quality concerns of present day Wikipedia. Likely due to these difficulties, AI-based quality control tools are only available for English Wikipedia and a few other, larger wikis.

Our goal in the Revision Scoring project is to do the hard work of constructing and maintaining powerful AI so that tool developers don't have to. This cross-lingual, machine learning classifier service for edits will support new wiki tools that require edit quality measures.

We'll be making quality scores available via two different strategies

via our Web interface (for bots and gadgets)

http://ores.wmflabs.org/scores/enwiki?models=reverted&revids=644899628|644897053

{"644899628": 
  {"damaging": 
    {"prediction": true, 
     "probability": {'true': 0.834253, 'false': 0.165747}
    }
  },
 "644897053":
  {"damaging": 
    {"prediction": false, 
     "probability": {'false': 0.95073, 'true': 0.04927}
    }
  }
}
via our library (batch processing)
from mw import api
from revscoring.extractors import APIExtractor
from revscoring.scorers import MLScorerModel

model = MLScorerModel.load(open("enwiki.damaging.20150201.model"))
api_session = api.Session("https://en.wikipedia.org/w/api.php")
extractor = APIExtractor(api_session, model.language)

for rev_id in [644899628, 644897053]:
    feature_values = extractor.extract(rev_id, model.features)
    score = model.score(feature_values)
    print(score)

We'll also provide raw labelled data for training new models.

Project status and getting involved

Mockup of the hand-coding interface

We've already completed our first milestone: replicating the state of the art in damage detection for English, Turkish and Portuguese Wikipedias. In the next two months, we will construct a manual hand-coding system and ask a set of volunteers to help us categorize random samples of edits as "damaging" and/or "good-faith". These new datasets will help us train better classifiers. If you'd like to help us gather data or extend the scoring system to more languages, please let us know by saying so on our talk page.

See also



















Wikipedia:Wikipedia Signpost/2015-02-18/Special_report