The Signpost

News from Diff

Content translation tool helps create one million Wikipedia articles

thumbless
thumbless


This article was originally published on Diff on November 16, 2021.

The origin of the tool/initial adoption

The Content Translation tool, which was developed by the Wikimedia Foundation Language team in 2014 to simplify translating Wikipedia articles, recently reached a massive milestone of supporting the creation of one million articles.

The tool plays a key role in closing knowledge gaps on Wikipedia by making it easier to translate Wikipedia’s knowledge into new languages. The tool’s journey has been steady and evolving over the past seven years. It is available by default in 90 Wikipedias, and it exists as a beta feature in the rest. It is used to translate an article every three minutes, and the articles created with the tool are deleted less often than those created from scratch.

We are excited to celebrate this remarkable milestone with over 70,000 Wikimedia contributors who helped get here! As we celebrate, we also want to reflect on the tool’s journey so far and take a look back to its beginnings and other major moments…

In 2014, the tool was tested in Wikimedia Labs, with a focus on translation from Spanish to Catalan. The tool was deployed after receiving positive feedback. This was just the beginning of our success story. The decision to test the tool with the languages mentioned above was influenced by the availability of the robust open-source machine translation support service through Apertium for them, and by the passionate Catalan community of contributors that were eager to participate in the testing and feedback process. These communities were the backbone of the tool and its chronicle is incomplete without Spanish and Catalan communities.

Getting established: a more solid tool

Following the successful deployment in Spanish and Catalan Wikipedia, in January 2015, the tool was enabled in six other Wikipedias (Danish, Esperanto, Indonesian, Malay, Norwegian (Bokmal) and Portuguese) as a beta feature. The deployment was further extended at the request of most communities to 22 Wikipedias. After three months, 260 users had translated with the tool, and 1,000 users manually enabled it on their Wikipedia from beta. The success so far motivated the team to deploy the tool in beta for all Wikipedias. The above decision was influenced by the positive acceptance and usage of the tool in less than six months of its enablement in eight languages. It is interesting to know that the outcome received mid-year 2015 proved our assumptions of accepting the tool by recording 1,300 new translators and 3,000 new translations. That year was undoubtedly a busy one for the development team and our ardent translators, who also reported dozens of bugs.

Another remarkable, eventful period for the Wikimedia Foundation Language team was when the tool started in 2018. This period was the revision phase of the Content Translation tool. Based on the feedback from translators across different languages about the tool, its impact and use over the years, the translation tool was ready for a revamp. The change focused on incorporating the more solid VisualEditor editing surface and other milestone improvements to evolve the Content Translation Version 2. By the end of 2019, the Wikimedia Foundation Language team had significantly updated the Content Translation tool, and it could boast of the following:

  • Better guidance for newcomers
  • Improved artificial intelligence to enhance automated steps
  • Quality control mechanisms for machine translation
  • Extended machine translation support service from Yandex, Google Translate, Youdao, Matxin (currently replaced by Elia), and Lingocloud
  • Independent customised systems to improve the quality of content in different Wikipedia communities.
  • The achievement of a five hundred thousand (500,000) articles milestone

Notwithstanding the above achievements, Content translation's developers had more work to do, with a big theme being to help more communities utilise the translation tool and attract newcomers in emerging communities. Being energised by what they have achieved with this tool and craving to support the volunteer communities that are ready to make the sum of all knowledge available for all, the team initiated the Content Translation Boost project.

New ways to translate: sections and mobile

We started research to explore more ways to translate and make the tool more pronounced, resulting in the launch of the Section Translation tool initiative and a process to enable Content Translation by default (out of beta) in Wikipedias that had fewer than 100,000 articles with the potential to grow with translation. With the above plans, the Wikimedia Foundation Language team were about to take translation to another dimension.

Section Translation became the primary project of the Boost project. Section Translation is an expansion of the capabilities of Content Translation to solve key limitations of the tool:

  • prioritising a mobile-friendly tool for phone and tablet users
  • allowing the collaboration of many users to translate articles section by section
  • attracting new contributors by lowering the entry barrier from translating an entire article to just a section
  • the capability to improve existing articles and not only create new ones.
Placeholder alt text
Section Translation on mobile

To a layman, Section Translation is still a translation tool that will help mobile device users translate articles in bits easily. Now that you know, let’s walk you through this phase.

In early 2020, before the COVID-19 pandemic, the project supported a design exploration to gather interview data about the assumptions of Section Translation. The prototype development started, and in the middle of the pandemic, the development of the tool was in full swing. In January 2021, an initial version was ready to be tested in a testing instance by Bengali Wikipedia editors. Bengali emerged as the chosen community because of their interest in the initiative and participation during the design exploration. The community tested the tool and provided feedback, and some of the feedback was adopted immediately.

In February, the Wikimedia Foundation Language team was ready to deploy Section Translation. This marked the beginning of another tool that will further bridge the content gap in small-sized Wikipedias.

Since the first enablement in Bengali Wikipedia, improvements have been made on the tool based on community feedback and takeaways from user research conducted after the deployment in Bengali Wikipedia. Some of the improvements are: introducing other entry points to increase discoverability and the ability to search for an article of interest.

As for the Section Translation tool, we are still evolving the tool and learning from the outcomes. Currently, after a feedback and validation process, it is enabled in five more Wikipedias: Igbo, Yoruba, Hausa, Thai, and Kurdish. We are excited about its future and impact. While we evaluate the tool's impact and users' experiences, and also continue to improve it, we welcome other members of the different Wikipedias to test the Section Translation, provide feedback and indicate interest in having it.

Congratulations and thank you to everyone who has been part of the journey to this one million article milestone!

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • I think translation should be taken with good care. In some wikis, there are bad machine translations and it's bad for Wikipedia development Thingofme (talk) 04:01, 29 November 2021 (UTC)[reply]
  • This text is rather promotionally worded ("take translation to another dimension", etc.) For balance, should it not have included discussion on the en.wiki restrictions on the tool? AllyD (talk) 08:37, 29 November 2021 (UTC)[reply]
    Because English is the source language of the translation, so the translation problem is smaller than a lot of wikis. In some wikis, they have a higher restrictions on Content Translation Tool, like limiting 85/15% or disabling completely, like m:Special:Translate; or restrictions only for extended-confirmed users. Machine translation reduces the quality of the works and many articles in other projects have been deleted because of machine or bot-making articles. Example Lsjbot Thingofme (talk) 03:35, 30 November 2021 (UTC)[reply]
  • This is a frightful and promotional article, about something that rarely works as well as promised. No one should be translating with tools as they should be consulting the original sources to see if text is actually verified and there is no close paraphrasing or copyvio, etc. You can’t do that with a tool. SandyGeorgia (Talk) 01:12, 30 November 2021 (UTC)[reply]
    • @SandyGeorgia: As suggested by @Thingofme: just above this, the problems translating into the English Wikipedia are not as strong as you represent IMHO - for various reasons. Different language versions have different standards. One of the big protections for enWiki in my experience is the stricter requirements for sources. If you're going to translate something into enWiki, you better have some pretty good sources in the original, minimum 3 to oversimplify. So, assuming folks respect our referencing requirements, only the very best of non-enWiki articles are going to be translated into English. Of course the "writer" should have a good knowledge of the original language as well. Given that there are a lot of non-native English speaking editors here with near-native English writing abilities, I'd encourage them to translate from their native languages into English, once they understand our standards - and using machine translation should be a pretty good time-saver. But, of course, just turning on the machine and plopping down the output as a completed article is not going to work.
    • As suggested above (once more) translating from enWiki into another wiki should be technically easier, but has cultural difficulties. Whether the article is about road construction, public health measures, or apple pie, their readers probably don't want an article that is completely from an anglophone POV and ignores their home country conditions. So they have other special conditions and rules. So machine translation these days is a good tool, just not a miracle method. Smallbones(smalltalk) 05:18, 30 November 2021 (UTC)[reply]
      So, assuming folks respect our referencing requirements … you see the problem? And, even if they do “respect our referencing requirements”, plagiarism and copyvio (along with poor machine translations) are often the problem. Then, these machine-translated articles hit DYK, where there may not be reviewers who have the language skills to check, and it turns out that a) they aren’t reliable sources, or b) they are poor translations with errors, or c) they have too close paraphrasing or plagiarism. SandyGeorgia (Talk) 13:21, 30 November 2021 (UTC)[reply]
  • Translation done well can be a highly-efficient way to amplify the impact of volunteers writing in one language, to quickly expand small Wikipedias, and to improve coverage of underrepresented topics on big Wikipedias. I have been very flattered in the past to see much article content I've written, particularly on the topic of Black Mirror, translated into several other languages. However, like those above I am concerned by the potential consequences of low-quality usage. It is a very dangerous tool if used even slightly wrongly: the person using it must have a good understanding of both languages (particularly the one they're translating it into); they must actually read the given sources; they must be taking care over all the normal things they would do when writing an article from scratch. These are the sorts of areas and content writing methods we see overeager Wikimedians causing large disruption in, and it can be very difficult to detect—as SandyGeorgia says above—because of the language barriers. — Bilorv (talk) 23:18, 5 December 2021 (UTC)[reply]
  • If only we could use DeepL to traslate we would have better translations (from my experience, DeepL is really better than Google t.)Javiermes (talk) 03:36, 14 December 2021 (UTC)[reply]
    Doesn't DeepL need a license to use its API? It is good at making translations, in my experience, but I hope that they license it out to WMF at a low cost. Those poor IP users already have enough fundraising banners as it is. Explodicator7331 (talk) 15:20, 16 December 2021 (UTC)[reply]

















Wikipedia:Wikipedia Signpost/2021-11-29/News_from_Diff