Research
Rewarding the best projects in language innovation
Imminent was founded to help innovators who share the goal of making it easier for everyone living in our multilingual world to understand and be understood by everyone else. Imminent builds a bridge between the world of research and the corporate world by supporting research through scientific publications, interviews, and annual grants, funding groundbreaking projects in the language industry.
With the Imminent Research Grants project, each year, Imminent allocates €100,000 to fund five original research projects with grants of €20,000 each to explore the most advanced frontiers in the world of language services. Imminent expects the call to appeal to startuppers, researchers, innovators, authors, university labs, organizations and companies. A research grant will be assigned to one project in each of the following categories: Language economics – Language data – Machine learning algorithms for translation – Human-computer interaction – Neuroscience of language.
2024 – Projects awarded
Research area – Language Data
CURVATURE-BASED MACHINE TRANSLATION DATASET CURATION
Michalis Korakakis University of Cambridge
Despite recent advances in neural machine translation, data quality continues to play a crucial role in model performance, robustness, and fairness. However, current approaches to curating machine translation datasets rely on domain-specific heuristics, and assume that datasets contain only one specific type of problematic instances, such as noise. Consequently, these methods fail to systematically analyse how various types of training instances—such as noisy, atypical, and underrepresented instances—affect model behaviour.
To address this the present project proposes to introduce a data curation method that identifies different types of training instances within a dataset by examining the curvature of the loss landscape around an instance—i.e., the magnitude of the eigenvalues of the Hessian of the loss with respect to that instance. Unlike previous approaches, such proposed method offers a comprehensive framework that provides insights into machine translation datasets independent of model architecture and weight initialisation. Additionally, it is applicable to any language pair and monolingual translation tasks such as text summarisation.
Research area – Language economics
DEVELOPMENT OF MULTILINGUAL MACHINE TRANSLATOR FOR PHILIPPINES LANGUAGES
Charibeth Cheng De La Salle University
The Philippines is an archipelagic country consisting of more than 7,000 islands, and this has contributed to its vast linguistic diversity. It is home to 175 living, indigenous languages, with Filipino designated as the national language. Within formal education, 28 indigenous languages serve as mediums of instruction, alongside English, which holds official status in business, government, and academia.
The Philippines’ diverse linguistic landscape underscores the need for effective communication bridges. The present project proposal aims to develop a multilingual machine translation system for at least 7 Philippine languages, aligning with efforts to standardize and preserve indigenous languages. Specifically, this project will focus on the following: First, to collect and curate linguistic data sets in collaboration with linguistic experts and naive speakers to ensure the accuracy and reliability of the translation system. As a second step, to implement machine learning algorithms and natural language processing techniques to train the translation model, considering the low-resource nature of Philippine languages. Finally, to evaluate the efficacy of the developed translation system using standardized metrics and human evaluation.
Research area – Neuroscience of Language
REALTIME MULTILINGUAL TRANSLATION FROM BRAIN DYNAMICS
Weihao Xia University College London
This project, Realtime Multilingual Translation from Brain Dynamics, is to convert brain waves into multiple natural languages. The goal is to develop a novel brain-computer interface capable of open-vocabulary electroencephalographic (EEG)-to-multilingual translation, facilitating seamless communication. The idea is to align EEG waves with pre-aligned embedding vectors from Multilingual Large Language Models (LLMs). The multi-languages are aligned in the vector space, allowing the model to be trained using only a text corpus in one language. EEG signals are real-time and non-invasive but exhibit significant individual variances. The challenges lie in the EEG-language alignment and across-user generalization. The learned brain representations are then decoded into the desired language using LLMs such as BLOOM that produces coherent text that is almost indistinguishable from text written by humans.
Currently, the primary application targets individuals who are unable to speak or type. However, in the future, as brain signals increasingly serve as the control factor for electrical devices, the potential applications will expand to encompass a broader range of scenarios.
Research area – Human-Computer Interaction
HOW CAN MT AND PE HELP LITERATURE CROSS BORDERS AND REACH WIDER AUDIENCES: A CASE STUDY
Vilelmini Sosoni Ionian University
Researchers studied the usability and creativity of machine translation (MT) in literary texts focusing on translators’ perceptions and readers’ response. But what do authors think? Is post-editing of MT output an answer to having more literature translated especially from lesser-used languages into dominant languages? The study seeks to answer this question by focusing on the book Tango in Blue Nights (2024), a flash story collection about love written by Vassilis Manoussakis, a Greek author, researcher and translator. The book is translated from Greek into English using Translated’s ModernMT system and is then post-edited by 2nd year Modern Greek students at Boston University who are native English speakers and have near native capacity in Greek. They follow detailed PE guidelines developed for literary texts by the researchers.
The author analyses the post-edited version and establishes whether it is fit for publication and how it can be improved. A stylometric analysis is conducted. The study is the first of its kind and wishes to showcase the importance of MT for the dissemination of literature written in lesser-used languages and provide a post-editing protocol for the translation of literary texts.
Research area – Machine learning algorithms for translation
LANGUAGE MODELS ARE MORE THAN CLASSIFIERS: RETHINKING INTERPRETABILITY IN THE PRESENCE OF INTRINSIC UNCERTAINTY
Julius Cheng University of Cambridge
Language translation is an intrinsically ambiguous task, where one sentence has many possible translations. This fact, combined with the practice of training neural language models (LMs) with large bitext corpora, leads to the well-documented phenomenon that these models allocate probability mass to many semantically similar yet lexically diverse sentences. Consequently, decoding objectives like minimum Bayes risk (MBR), which aggregate information across the entire output distribution, produce higher quality outputs than beam search.
Research on interpretability and explainability for natural language generation (NLG) has thus far almost exclusively focused on generating explanations for a single prediction, yet LMs have many plausible high probability predictions. The proposal aims to adapt interpretability to this context by investigating the question,“ Do similar predictions have similar explanations?” This will be addressed by comparing explanations generated by interpretability methods such as attention-based interpretability, layerwise relevance propagation, and gradient-based attribution across predictions.
The goal of this project is to advance research in interpretability for NLG, to improve understanding of the generalization capabilities of LMs, and to develop new methods for MBR decoding.

Imminent Research Grants
$100,000 to fund language technology innovators
Imminent was founded to help innovators who share the goal of making it easier for everyone living in our multilingual world to understand and be understood by all others. Each year, Imminent allocate $100,000 to fund five original research projects to explore the most advanced frontiers in the world of language services. Topics: Language economics – Language data – Machine learning algorithms for translation – Human-computer interaction – The neuroscience of language.
Apply now2023 – Projects awarded
Research area – Human-computer interaction
USABILITY OF EXPLAINABLE ERROR DETECTION FOR POST-EDITING NEURAL MACHINE TRANSLATION
Gabriele Sarti University of Groningen
Predictive uncertainty and other information extracted from MT models provide reasonable estimates of word-level translation quality. However, there is a lack of public studies investigating the impact of error detection methods on post-editing performance in real-world settings. The present project proposal is to conduct a user study with professional translators for two language directions sourced from recent DivEMT dataset. The aim is to assess whether and how error span highlights can improve post-editing productivity while preserving translation quality. There will be a focus on the influence of highlights quality by comparing (un)supervised techniques with best-case estimates using gold human edits, using productivity and enjoyability metrics for evaluation.
Such direction could be relevant to validate the applicability of error detection techniques aimed at improving human-machine collaboration in translation workflows. The proposal is a reality check for research in interpretability and quality estimation and will likely impact future research in these areas. Moreover, positive outcomes could drive innovation in post-editing practices for the industry.
Research area – Human-computer interaction
HUMANITY OF SPEECH
Pauline Larrouy-Maestri Max Planck Institute
Synthetic speech is everywhere, from our living room to the communication channels that connect humans all over the world. Text-to-speech (TTS) tools and AI voice generators aim at creating intelligible and realistic sounds to be understood by humans. Whereas intelligibility is generally accomplished, the voices do not sound natural and lack “humanity,” which impacts users’ engagement in human-computer interaction.
The present project proposal aims at understanding what a “human” voice is a crucial issue in all domains relative to language, such as computer, psychological, biological, and social sciences. To do so, 1) the timbral and prosodic features that are used by listeners to identify human speech will be investigated, and 2) how “humanness” is categorized and transmitted will be determined. Concretely, a series of online experiments using methods from psychophysics are planned to run. both the speech signal, through extensive acoustic analyses and manipulation of samples, as well as on the cognitive and social processes involved, will be analyzed.
Research area – The neuroscience of language
TRACKING INTERHEMISPHERIC INTERACTIONS AND NEURAL PLASTICITY BETWEEN FRONTAL AREAS IN THE BILINGUAL BRAIN
Simone Battaglia University of Bologna
Which is the human brain network that supports excellence in simultaneous spoken-language interpretation? Although there is still no clear answer to this question, recent research in neuroscience has suggested that the dorsolateral prefrontal cortex (dlPFC) is consistently involved in bilingual language use and cognitive control, including working memory (WM), which, in turn, is particularly important for simultaneous interpretation and translation. Importantly, preliminary evidence has shown that functional connectivity between prefrontal regions correlates with the efficient processing of a second language.
The present project proposal aims to characterize space-time features of interhemispheric interactions between left and right dlPFC in bilingual healthy adults divided into two groups of professional simultaneous interpreters and non-expert bilingual individuals. In these two groups, cutting-edge neurophysiological methods are used for testing the dynamics of cortico-cortical connectivity, namely TMS-EEG co-registration, focusing on bilateral dlPFC connectivity. The procedure will allow to non-invasively stimulate the dlPFC and track signal propagation, to characterize the link between different aspects of language processing, executive functions, and bilateral dlPFC connectivity. novel insights into the neural mechanisms of interhemispheric communication in the bilingual brain are provided and characterize the pattern of connectivity associated with proficient simultaneous interpretation.
Research area – MACHINE LEARNING ALGORITHMS FOR TRANSLATION
OPEN-SOURCING A RECENT TEXT TO SPEECH PAPER
Phillip Wang
Open source implementations of scientific papers are one of the essential means by which progress in deep learning is achieved today. Corporate players have no longer open sourced recent text to speech model architectures, often not even trained models. Instead, they tend to publish a scientific paper, sometimes with details in additional material, and an accompanying demo with pre-generated audio snippets.
The proposed improvement involves implementing a recent TTS paper such as Voicebox, open-sourcing the architecture. In addition, as far as possible, efforts will be made to collect training data, train the model and demonstrate that the open-sourced architecture performs well, for example by illustrating notable features or approximately reproducing some performance results (e.g. CMOS).




Imminent Research Grants
$100,000 to fund language technology innovators
Imminent was founded to help innovators who share the goal of making it easier for everyone living in our multilingual world to understand and be understood by all others. Each year, Imminent allocate $100,000 to fund five original research projects to explore the most advanced frontiers in the world of language services. Topics: Language economics – Language data – Machine learning algorithms for translation – Human-computer interaction – The neuroscience of language.
Apply now2022 – Projects awarded
Research area – Language economics
T-INDEX
Economic Complexity research group Centro Ricerca Enrico Fermi
Understanding which countries and languages dominate online sales is a key question for any company wishing to translate its website. The goal of this research project is to complement the T-Index by developing new tools capable of identifying emerging markets and opportunities, thus predicting which languages will become more relevant in the future for a specific product in a specific country. As a first step, the Economic Fitness and Complexity algorithm will be used to identify countries that are expected to undergo significant economic expansion in the coming years. Subsequently, network science and machine learning techniques are used to predict the products and services that growing economies are likely to start importing.
Research area – The neuroscience of language
THE NEUROSCIENCE OF TRANSLATION. NOVEL AND DEAD METAPHOR PROCESSING IN NATIVE AND SECOND-LANGUAGE SPEAKERS
Martina Ardizzi and Valentina Cuccio
The NET project aims to investigate the embodied nature of a second language, focusing on a specific linguistic element that merges abstract and concrete conceptual domains: metaphors. The idea behind the project fits within the embodied simulation approach to language, which has been poorly applied in the field of translation despite being widely confirmed in the study of native languages. Specifically, during the project the brain activities of native Italian speakers and second-language Italian speakers will be recorded while they read dead or novel Italian metaphors. It will be expected to show a different involvement of the sensorimotor cortices of the two groups in response to the different types of metaphors. The results of NET may provide new insights on how to improve disembodied AI translations.
Research area – Language Data
COLLECTION OF SPEECH DATA (50 HOURS) IN A CROWDSOURCED VERSION FOR THE YORÙBÁ LANGUAGE
Kọ́lá Túbọ̀sún
Yorùbá is one of the most widely spoken languages in Africa with 46 million first and second language speakers. Yet there is hardly any language technology available in Yorùbá to help them, especially illiterate or visually impaired people who would benefit most. The present project proposal aims at developing speech technology in Yorùbá in order to make everyone be understood.
As a first action, aligned voice and text resources will be recorded professionally in a quality usable to produce text-to-speech systems. After donating this data under a Creative Commons license to the Mozilla Common Voice repository, further speech data will be collected from volunteers online. To increase the quality of the text, a diacritic restoration engine has already been developed.
Research area – Machine learning algorithms for translation
INCREMENTAL PARALLEL INFERENCE FOR MACHINE TRANSLATION
Andrea Santilli La Sapienza University
Machine translation works with a de facto standard neural network called Transformer, published in 2017 by a team at Google Brain. The traditional way of producing new sentences from the Transformer is one word at a time, left to right; this is hard to speed up and parallelize.
A similar problem has been spotted and was solved in image generation by using “incremental parallel processing”, a technique which refines an image progressively rather than generating it pixel by pixel, yielding speedups of 2-24×.
There is a proposal to port this method to Transformers, using clever linear algebra tricks to make it happen. This technique and other similar ones could make machine translation less expensive, and therefore accessible to a larger number of use cases, and ultimately people.
Research area – HUMAN-COMPUTER INTERACTION
INVESTIGATING THE POTENTIAL OF SPEECH TECHNOLOGIES – SYNTHESIS AND RECOGNITION – TO IMPROVE THE QUALITY OF PROFESSIONAL AND TRAINEE TRANSLATORS’ WORK.
Dragoș Ciobanu University of Wien
Translators carry out a cognitively demanding, repetitive task which requires continuous high concentration. When they post-edit neural draft translations, a known source of errors is called the “NMT fluency trap”, where the target sentence sounds very fluent and error-free, but this might hide infidelities or alterations with respect to the source.
Some promising experimental results show that this situation can be helped by reading the source side out loud, using speech synthesis.
The practicality and cognitive impact of this new modality will be evaluated to ensure that it does not slow down the overall translation process. To do this the translator’s gaze while they work will be tracked, their focus and cognitive load. This idea could make the translator’s work easier and reduce errors.




Imminent Research Grants
$100,000 to fund language technology innovators
Imminent was founded to help innovators who share the goal of making it easier for everyone living in our multilingual world to understand and be understood by all others. Each year, Imminent allocate $100,000 to fund five original research projects to explore the most advanced frontiers in the world of language services. Topics: Language economics – Language data – Machine learning algorithms for translation – Human-computer interaction – The neuroscience of language.
Apply nowPhoto credit: Google Deepmind – Unsplash