The Signpost

News and notes

Researchers find that Simple English Wikipedia has "lost its focus"

Readability of Simple English and English Wikipedias called into question

In its September issue, the peer-reviewed journal First Monday published The readability of Wikipedia, reporting research which shows that the English Wikipedia is struggling to meet Flesch reading ease test criteria, while the Simple English Wikipedia has "lost its focus".

The statistical method developed by Flesch (1948) focuses on two core components of the concept of readability: word length and sentence length. The test is widely used in the US, with areas of application ranging from Pentagon files to life insurance policies. The concept has been adapted for other languages, including a German version by Toni Amstad (1978). The Flesch test uses the following formula to indicate the readability of a given text:[1]

Higher scores indicate material that is easier to read, and lower scores that it is more difficult. While in theory the results can vary widely due to the artificial construction of very complex or simple sentences, in practice natural English typically results in a score between 0 and 100, which can be interpreted as shown in the table.

Score Notes
90.0–100.0 Very easy
80.0–90.0 Easy
70.0–80.0 Fairly easy
60.0–70.0 Standard
50.0–60.0 Fairly difficult
30.0–50.0 Difficult
0.0–30.0 Very difficult

The authors assume that the English Wikipedia should score around 60–70 on average ("standard"), and Simple English, which explicitly aims at audiences with less advanced literacy skills, around 80 ("easy"). An older study, Besten and Dalle (2008), had found on the basis of the same test method that the overall readability of Simple had decreased from around 80 in 2003 to just above 70 in 2006.

The 2012 study examined two 2010 database dumps it sampled from English Wikipedia and Simple. For the study, the scientists filtered out lists, redirects, and disambiguation pages, and removed components such as tables, headings, and images. Thus, the study examined 88% of the English and 85% of Simple Wikipedia's articles in the database dump. In a second step, the methodology excluded short articles with fewer than six sentences (due to their likely wide fluctuation in readability).

The analysis found that English Wikipedia articles scored 51 on average ("fairly difficult") with more than 70% of all articles scoring less than the set goal of 60 ("standard"). Simple scored 62 on average ("standard") with 95% of all entries below the set 80 ("easy") goal. In addition, a set of around 9600 respective articles was comparable between both Wikipedia versions; Simple scored 61 on these, while the related English Wikipedia articles scored 49.

The paper argues that the creation of Simple as a solution for readability issues of the English Wikipedia with some audiences has run into difficulties. The average reading ease of Simple, while still above the English Wikipedia, declined compared to the findings of Besten and Dalle in 2008 (2003: 80, 2006: just above 70) to 62 on average. Based on the outlined methodology, the authors conclude that Simple has "lost its focus … this version now seems suitable for the average reader, instead of aiming at those with limited language abilities."

The English Wikipedia findings indicate that the results of another study in 2010, focusing on the readability of English Wikipedia entries on cancer (Signpost coverage), cannot be fully generalized. The paper in 2010 found that articles in the targeted topic area scored about 30 on average.

However, both studies show that the English Wikipedia potentially excludes major segments of the English-speaking world, including (for example) large parts of the US public. According to a major study on literacy in the US in 2002, 21–23% (extrapolated: more than 40 million people) "demonstrated skills in the lowest level of prose, document, and quantitative proficiencies".

The authors of the study on readability of Wikipedia have set up a demo site where users can calculate the readability of English and Simple English Wikipedia pages based on the automatic measure they deployed in the paper.

Brief notes


















Wikipedia:Wikipedia Signpost/2012-09-10/News_and_notes