Research
In our Conference Corner, readers will find Imminent’s take on the most important conferences in language research. Each edition highlights the most interesting talks, notable papers, and emerging trends presented at these events. Whether you’re exploring advances in linguistics, NLP, or broader language sciences, our curated summaries provide a clear and engaging snapshot of the ideas and innovations shaping the field.
Empirical Methods in Natural Language Processing 2025
Why EMNLP?
EMNLP is one of the world’s leading conferences on empirical natural language processing, with a strong focus on practical results, large-scale experimentation, and emerging applications of language technologies. Its relevance is immediate: the conference spans machine translation, multimodality, model evaluation, dataset creation, ethical considerations, and the fast-evolving ecosystem of large language models. This year it was held in Suzhou, China.
EMNLP also hosts the Workshop on Machine Translation (WMT), the premier annual workshop on machine translation. Tracking WMT findings has long informed our research directions in quality estimation. Attending EMNLP 2025 therefore offered not only inspiration but also direct insight into the future landscape of applied NLP.
Emerging Trends
One of the clearest cross-cutting themes this year was LLM efficiency. As models grow ever larger, researchers are increasingly focused on making them faster, lighter, and more accessible. Topics such as advanced KV-caching, extreme quantization, and improved distillation methods stood out. These approaches aim to reduce computation, cut memory needs, and make high-performance models usable on modest hardware — an essential development for many applied research teams.
Another emerging space is the use of LLMs in every stage of the research pipeline, from synthetic dataset creation to automated evaluation. This shift is reshaping traditional workflows and prompting new questions about how the field measures quality, novelty, and reliability.
Highlights from Talks and Papers
Among the most memorable keynotes was Heng Ji’s “No more processing. Time to discover.” Speaking from the vantage point of drug discovery and the broader “AI for Science” movement, she challenged the community to re-center research around genuine breakthroughs rather than incremental improvements. Her call for models and methods that support true scientific discovery resonated strongly throughout the conference.
From an MT perspective, Lonyue Wang (Alibaba International) delivered a timely talk on the evolution of multilingual translation in the LLM era. He described the shift from fine-tuned LLMs to reasoning-driven models and now to LLM-based agents — while also highlighting the persistent gap between academic MT benchmarks and the needs of industrial-scale translation systems.
On the technical front, the paper “S1: Simple Test-time Scaling” drew considerable attention. It presents an open-source method for boosting model performance by increasing compute only at inference time — an ability previously demonstrated by proprietary systems such as OpenAI’s o1. With a modest finetuning set of just 1,000 questions on Qwen2.5-32B, the authors achieved competitive gains, opening the door to new experimentation strategies for many labs.
Here is a list of other research that stood out during the conference:
- Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
- Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
- Efficient Model Development through Fine-tuning Transfer
- Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models
- Evaluating Language Translation Models by Playing Telephone
- Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Conference Main Themes
Beyond efficiency and LLM-centric pipelines, several research threads recurred across workshops and oral sessions.
From the works that stood out from the sea of incremental research pieces, the common themes involved explorations inside LLM building blocks to hypothesize newer approaches in the architecture (e.g. a mixture of heterogenously-sized experts rather than homogenous ones in practice), and interpretations from LLM’s internal representations for their implications on surface-level output.
New Resources
For quality estimation work, several new models merit improved capabalities, including NVIDIA’s Qwen-MQM, Google’s MetricX-25 and GemSpanEval, and an RL-based MT evaluation model presented by researchers from the University of Amsterdam. Together, these resources represent some of the most relevant and forward-looking contributions to EMNLP and WMT 2025.