With the rapid developments in machine translation ( ) in recent years - driven, for instance, by globalisation and migration, as well as economic benefits - the need for MT evaluation is also increasing. As new systems are developed, improved evaluation methods are needed in order to determine which translation system is best for a given dataset, language pair and/or domain. Bram Vanroy says: ‘A lot of people are using things like ChatGPT. But we have no idea how good the quality is unless we evaluate it. And it's the same with machine translation systems: if you do not speak the language, then you have no idea whether the quality [of the translation] is good or bad. Users need to know that their chosen MT system is good-quality and safe to use.’
MATEO (MAchine Translation Evaluation Online) is a new, user-friendly tool that answers a growing need for easy and accessible evaluation of machine translation. The evaluation software was developed by Bram Vanroy, Arda Tezcan, Lieve Macken and Michaël Lumingu from Ghent University, and was funded by a CLARIN Bridging Gaps grant after getting seed funding from the European Association for Machine Translation. Aimed at both expert and non-expert user groups, this freely accessible and open-source tool enables users to determine which MT system is best suited to their personal needs by comparing MT suggestions with reference translations.
State-of-the-art metrics are being created to keep up with the increase in quality. These often rely on the premise that a machine translation is a good translation if it is close to a human reference translation - so-called ‘reference-based evaluation’. However, using these evaluation methods is not always straightforward. The code to run them is spread across different repositories, and after downloading them, they need to be installed on personal devices alongside the right dependencies. Users then need to be able to use the command-line or Python (most often) and be aware of the expected input formats of their data. This can be a frustrating and time-consuming process.
MATEO makes all these steps obsolete by providing a user interface that allows users to simply use their keyboard and mouse to specify the relevant evaluation metrics, the translations to evaluate (up to four different MT systems) and their reference translations. Expert users also have advanced options to specify specific parameters of the different metrics. MATEO currently supports the MT evaluation systems BLEU, ChrF, TER, COMET, BLEURT, and BERTScore.
Vanroy gives an example of how MATEO may be used: ‘A researcher may have a text in 19th century German and they would like MT to translate it. The only thing they need is the text and a short human translation. The more data you can input, the better. Then they can put the German text, and the translation in, say, contemporary English through the system, and then the system will tell them whether these are good translations or not. You could try different machine translation systems to find out which system is best for translating old German, according to the evaluation scores that MATEO provides.’
Expert and Non-Expert Users
Much attention was paid to what the output of the interface should be to cater to different user groups. It is designed so that both experts (industry, researchers) and non-expert users (SSH researchers, students and teachers) can get useful information from it. In the interface, a table is provided that contains the performance of each given system as calculated by the selected metrics. For experts, this table also contains confidence intervals and significance scores and it can be copied into LaTeX or downloaded as an Excel or tab-separated file. Bar and radar plots are provided to visualise the results of the different systems and, if available, a fine-grained sentence-level scatter plot makes it possible to investigate for each sentence how well or poorly each model scores. These sentence scores can also be downloaded and are particularly useful for in-depth, linguistic analyses.
The interface was designed to be accessible to all users. Vanroy says: ‘The basic layout of the tool is focused on the non-expert users. So you get the basic info, it's easy to use, you just click some buttons, and you're done. But then there are also advanced options that are more hidden - you can add more parameters, for example. And then in terms of export options, you also have different options for different users. So it's built from that perspective: it should be easy to use, but if you are more advanced, you can use advanced options.’
MT Literacy and Education
While the evaluation component is the core part of MATEO, it also aims to improve the literacy of its users: it contains a page to explain more about machine translation evaluation metrics and the MATEO project. By making MT evaluation accessible to diverse target audiences, it enables even non-experts to easily evaluate machine-generated translations. In addition to the evaluation function,, MATEO also encourages users to reflect on the implications of relying on MT systems for tasks. Vanroy points out that many MT systems are so-called closed-source systems: ‘You have no idea how the system works, what data it was trained on, how it is trained. This is typical for Google Translate, DeepL, or systems like that, they are all closed-source. I'm a big supporter of open source.’
As the project continues to evolve, it seeks to not only streamline MT research but also become a resource for lecturers and students because it emphasises the importance of evaluating language resources. MATEO has already been used in MT classes at Ghent University, where students used the tool for assignments to improve their MT evaluation skills. Indeed, their feedback was gathered to improve the user experience of the interface. Feedback at conferences and online so far has been very positive, drawing interest from teachers and industry alike.
Machine Translation System
For ease of use, a translation component is also implemented in MATEO, so that users can create baseline translations via an open-source, machine-translation model. The translation system, called ‘No Language Left Behind’ by Meta AI, allows users to translate from and into around 200 languages. So if users do not have access to a machine translation system, it gives them access to something that is open.
Vanroy believes that for non-critical communication, systems have now reached a level that enables effective communication across many language pairs. However, he points out that MT systems are still not as good as humans and that for critical, more sensitive communication, there is still work to be done: ‘We are getting requests to work on things like medical translation, news translation, or legal translation. And these are important and exciting problems, but also critical problems. It's worth looking into, because if we can get these domains also up to a better level, where it is usable and high quality, then that's for the benefit of everybody.’
Publications and Future Plans
MATEO was informally presented at the annual CLARIN conference in 2022 and at the European conference on machine translation (EAMT 2023), where the team also presented a user and research study. The final version will also be presented at the annual CLARIN conference 2023. Interested readers are also invited to an online demo where Vanroy will show-case the tool on October 13th, 2023.
The tool is hosted at the CLARIN B-centre of the Dutch Language Institute (Instituut voor de Nederlandse Taal; INT) and is available here. As such, MATEO has been mentioned in INT’s bimonthly newsletter. In addition, the source code for the complete interface is available with a GPLv3 licence. Interested users can download and run MATEO on their own computer or server with Python or through Docker. And to make the tool even more accessible, anyone can simply ‘clone’ the website to their own, private instance to ensure fast response times.
The active part of the project has concluded, and Vanroy is now engaged with the Horizon2020 project SignON, which focuses on automatic sign language translation. However, Vanroy plans to support MATEO as long as possible, but also invites others to contribute or improve the system: ‘While I intend to support MATEO as long as I can, by making the tool open-source other actors can also contribute, add new evaluation metrics, clone, or maintain it, which ensures the longevity of the evaluation tool for researchers, industry partners and teachers and students alike.’
Thesis title: 'How Creative are Translated Subtitles? Automating the Detection of Creative Shifts in English-to-Dutch Subtitles'
Bram Vanroy (PI), Post-Doctoral Researcher at Ghent University and KU Leuven
Lieve Macken (supervisor), Associate Professor, Department of Translation, Interpreting and Communication, Ghent University
Arda Tezcan (supervisor), Researcher, Department of Translation, Interpreting and Communication, Ghent University
Michaël Lumingu (support staff), Department of Translation, Interpreting and Communication, Ghent University