Research

Philipp Koehn
Professor at Johns Hopkins University
Philipp Koehn is a professor of computer science at Johns Hopkins University, affiliated with the Center for Language and Speech Processing. His research focuses on machine translation, natural language processing, and cross-language information retrieval. He is known for pioneering work in statistical and phrase-based machine translation, including the development of the widely used Europarl corpus and Moses, an open-source translation system.
Can open-source LLMs provide a viable alternative to proprietary AI models, particularly for language translation and localization?
Yes, in principle. Since there is no secret source and the methods to build LLMs are broadly known, I expect that the various existing open source efforts (funded commercially or from governments) will continue to track the quality of proprietary AI models. For language translation, there is always potential to improve foundation models with customer or low resource language data, and thus surpassing the quality of pre-existing models.
What level of performance is expected from the open-source system? What impact does the decentralization of AI research (through open-source models) have on innovation in machine translation?
Over the next months and years we will see continued efficiency and quality improvements. Automated translation is good enough for many use cases, but there will always be a gap to true perfection — which requires either human intervention or tolerance of these deficiencies.
Your work on “No Language Left Behind” has made significant strides in low-resource language translation. What role can open-source AI play in ensuring quality translation?
Open source AI LLMs are very similar to the machine translation models used for NLLB — they are based on the same Transformer model that was originally built for machine translation. Since there is so much more effort these days in building open source LLMs, this is probably also the vehicle where we will see improvements in translation quality.
How can AI be used to not only translate words but also facilitate cross-cultural knowledge transfer?
The challenge that current AI models work worse for languages other than English is still a largely underserved problem. We still work towards methods that make LLMs independent of the language we use to interact with them. To some degree, current models are able to answer questions in one language, even if the required knowledge was only entered in another language. But something gets lost when passing the language barrier — and there is a lot to do to address this problem.