Research

Graham Neubig
Associate Professor at the Language Technologies Institute of Carnegie Mellon University
Graham Neubig is an associate professor at the Language Technologies Institute of Carnegie Mellon University. His research focuses on natural language processing, with a particular interest in fundamentals, applications, and understanding of large language models for tasks such as question answering, code generation, and multilingual applications. His goal is to make it possible for people to interact seamlessly with each other and with computers in their own language. He is also committed to advancing NLP accessibility through open research publications, comprehensive course materials, video lectures, and open-source software.
Based on your research, how do you see prompt optimization shaping the accuracy and efficiency of large language models in various applications, including machine translation and localization?
I think prompt optimization is definitely an important part for many LLM applications that have a large number of requirements. For more standard translation tasks it may be a bit less important (because the task is relatively straightforward), but if you have a large number of specific requirements such as terminology or style it could become more important.
What are the limitations of current prompting techniques, and how might they be improved to allow LLMs to retrieve more reliable, context-aware knowledge?
Language models tend to do well when they have appropriate context to disambiguate any ambiguous decisions. For translation, this can include adding the appropriate translations into the context that you provide to the language model. This sort of technique is usually called “retrieval augmented generation (RAG),” and it actually has been around for a while now in translation, there was a paper in 2017 that proposed this: Search Engine Guided Non-Parametric Neural Machine Translation.
Do you believe prompt engineering will remain a human-driven skill, or will advancements in AI automate and optimize the process to a point where human intervention is minimal?
This is a very good question. I think that as language models have gotten better, prompt engineering has evolved from being something of a dark art to now mostly being a matter of clearly specifying the problem that you’re trying to solve. There are two ways to specify this problem: one by prompting, and the other by providing lots of examples. If you are able to provide a lot of examples, then there are methods for automatically optimizing prompts to get the best results. DSPy is a software library that is a famous example of this. DSPy: The framework for programming – not prompting – language models. However, often you’re not able to prepare a lot of examples beforehand, in which case being able to clearly describe the requirements of your problem in a relatively verbose prompt becomes more important.
How can LLMs be better integrated into human translation workflows to enhance translation quality, nuance, and cultural adaptation?
That’s also a great question. I’m not working as a professional translator at the moment, but in my occasional amateur translation, I use LMs in a variety of ways. Obviously you can do post-editing where the LM translates first and a translator goes in and checks the results. Alternatively, you can use an LM to post-edit your own translations. I also really like to use LMs to brainstorm different wordings, asking it to generate several possible options and then picking my favorite one. That’s a good way to improve nuance. One other thing that I should note is that at this point each LM kind of has its own “personality”, so it’s worth shopping around for an LM that you think does a good job. For instance I use ChatGPT for information gathering tasks, Claude for coding tasks, and DeepSeek for more creative tasks, because I’ve seen them be good at these.
The question about cultural adaptation is an important one, but that in particular I don’t think has been solved by the language models out there. In fact one of my major research directions is creating multilingual, culturally sensitive models, both from the point of view of language models that can understand images (Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages) and language models that can generate or translate images for the purposes of localization and culturally sensitive content creation (An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance).