Translated's Research Center

Conference Corner

In our Conference Corner, readers will find Imminent's take on the most important conferences in language research. Each edition highlights the most interesting talks, notable papers, and emerging trends presented at these events.


Research

In our Conference Corner, readers will find Imminent’s take on the most important conferences in language research. Each edition highlights the most interesting talks, notable papers, and emerging trends presented at these events. Whether you’re exploring advances in linguistics, NLP, or broader language sciences, our curated summaries provide a clear and engaging snapshot of the ideas and innovations shaping the field.


NeurIPS 2025

This flagship conference represents the leading event for cutting-edge empirical and theoretical advances in Machine Learning, AI scaling, Foundation Models, Reinforcement Learning, and Multimodal Systems. 

Attending NeurIPS 2025 provided resources, motivation, and strategic context to refine quality estimation models, integrate advanced architectures into data pipelines, and advance multimodal models.

Emerging Trends

At NeurIPS 2025, several emerging areas gained prominence, particularly LLM efficiency, alignment for evaluation, ethics and reliability, and reasoning capabilities. These topics reflect the field’s shift toward large-scale real-world deployment, where models must overcome issues such as unreliable reasoning, privacy risks, and high computational costs. Research focused on making LLMs more practical through efficiency techniques like deeper architectures and quantization, improving evaluation through better alignment and scrutiny of LLM-as-judge systems, and addressing trust concerns including data memorization, privacy, and the risk of output homogenization across models.

Highlights from Talks and Papers

A standout oral, “Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, challenged assumptions about RLVR, showing that reinforcement learning can improve performance but does not fundamentally extend reasoning beyond the base model’s inherent capabilities.

Scaling models remains a powerful driver of new capabilities. The Best Paper, “1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities”, demonstrated that extreme network depth (up to 1024 layers) can dramatically boost goal-conditioned self-supervised robotic performance. Meanwhile, “Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)” revealed convergence patterns in LLM outputs, underscoring the need for diversity checks in MT and QA tasks.

Efficiency and trust were also key focuses. Talks on scaling data quality and privacy/legal risks in generative AI offered frameworks for high-volume data validation and strategies to mitigate privacy exposure, addressing compliance and production challenges. Posters like “LittleBit: Ultra Low-Bit Quantization via Latent Factorization” and “Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness” introduced ultra-low-bit quantization and rigorous unlearning evaluations, enabling efficient models while safeguarding sensitive information.

Overall, the conference emphasized the dual challenge of pushing LLM capabilities while ensuring reliability, ethical safeguards, and efficient deployment, highlighting that advances in reasoning, scale, and data stewardship are tightly intertwined.

Conference Main Themes

The conference highlighted the gap between benchmark performance and real-world reliability, stressing the need for large models that are efficient, practical, and broadly usable.

Trust, safety, and responsible deployment were key concerns, alongside smarter architectures and evaluation strategies. Human-AI collaboration emerged as a focus, combining human oversight with AI scalability.

New Resources

NeurIPS 2025 highlighted key open-source resources, including the Infinity-Chat dataset for diversity-aware generation, the aformentioned LittleBit ultra-low-bit quantization toolkit for efficient edge models, and the Toloka hybrid human-AI annotation platform for scalable data curation. These tools and datasets represent some of the most impactful contributions for practical NLP and AI research.


EURIPS 2025

EurIPS 2025, the first European satellite event of flagship event NeurIPS held in Copenhagen, provided a unique strategic advantage. By concentrating around 2000 members of the European AI talent pool and featuring a selective program (38 Orals, 241 posters), it offered high-density networking and access to scientific advancement in AI within an EU context.

Emerging Trends

At EurIPS 2025, Sustainable AI was a central theme, focusing on efficiency, adaptability, and environmental responsibility. Emtiyaz Khan’s talk, “Adaptive Bayesian Intelligence and the Road to Sustainable AI” introduced continual learning methods that reduce the need to retrain models from scratch, while Sepp Hochreiter’s “Sustainable, Low-Energy, and Fast AI Made in Europe” presented xLSTM, an efficient alternative to Transformers that lowers energy use, speeds inference, and handles longer contexts for practical, low-energy AI deployment.

Highlights from Talks and Papers


Some papers and talks The Art of (Artificial) Reasoning, by Yejin Choi who argues that it is possible to democratize generative AI transcending current Scaling Laws, which implies the only path is extreme scaling of resources, by innovating with unconventional data, algorithms and collaboration. She collaborated on her position presenting ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models demonstrating it is possible to push reasoning capabilities of small models with careful RL. For data she presents Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning demonstrating is possible to scale performance scaling synthetic data generation, and as example of extreme collaboration she present OpenThoughts: Data Recipes for Reasoning Models, a dataset created coordinating many universities, companies and startups.

Some interesting talk and posters, focusing on techniques directly applicable in language processes: 

Inference-Time Hyper-Scaling with KV Cache Compression – boosts reasoning accuracy by compressing Transformer’s KV caches to allow more token generation within the same memory footprint. The proposed method, named Dynamic Memory Sparsification (DMS), sparsifies KV caches achieving up to 8x compression by teaching pre-trained models which tokens can be scheduled for future eviction.

Beyond Oracle: Verifier-Supervision for Instruction Hierarchy in Reasoning and Instruction-Tuned LLMs (paper) – introduces a unified framework that improves instruction hierarchy in LLMs (e.g: System vs User) by utilizing programmatically verifiable signals instead of costly oracle labels. The method introduces a synthesis pipeline to create conflicting instruction pairs and executable verifiers (i.e: python functions), ensuring dataset quality through an automated unit testing and repair. Using RL with verifiable rewards with this dataset significantly enhances adherence to complex directives and safety robustness.

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention (paper) – The research introduces a parallel generation method that allows multiple LLM instances to collaborate via a shared, concurrently-updated KV cache. The system enables workers to decide a shared strategy and synchronize seeing each other’s memory (KV entries) without requiring additional fine-tuning or recomputation. This new promising approach enables effective and efficient collaboration between multiple LLMs boosting accuracy on reasoning tasks.

Conference Main Themes

Beyond the central theme of Sustainable AI, the conference maintained a strong balance across a broad range of topics. The program featured diverse research areas including Model Optimization and Representation Learning, as well as specialized fields such as Computer Vision, Diffusion Models, Graph Neural Networks, Causal Inference, and Reinforcement Learning.

New Resources

EurIPS 2025 highlighted key open-source resources, including TabArena LivingBenchmark – a living benchmark for ML on Tabular Data, advocating for evolving evaluation benchmarks over static ones. Although applied to tabular data, the specific challenges addressed in the paper are likely applicable to other domains as well.


EMNLP 2025

EMNLP is one of the world’s leading conferences on empirical natural language processing, with a strong focus on practical results, large-scale experimentation, and emerging applications of language technologies. Its relevance is immediate: the conference spans machine translation, multimodality, model evaluation, dataset creation, ethical considerations, and the fast-evolving ecosystem of large language models. This year it was held in Suzhou, China.

EMNLP also hosts the Workshop on Machine Translation (WMT), the premier annual workshop on machine translation. Tracking WMT findings has long informed our research directions in quality estimation. Attending EMNLP 2025 therefore offered not only inspiration but also direct insight into the future landscape of applied NLP.

Emerging Trends

One of the clearest cross-cutting themes this year was LLM efficiency. As models grow ever larger, researchers are increasingly focused on making them faster, lighter, and more accessible. Topics such as advanced KV-caching, extreme quantization, and improved distillation methods stood out. These approaches aim to reduce computation, cut memory needs, and make high-performance models usable on modest hardware — an essential development for many applied research teams.

Another emerging space is the use of LLMs in every stage of the research pipeline, from synthetic dataset creation to automated evaluation. This shift is reshaping traditional workflows and prompting new questions about how the field measures quality, novelty, and reliability.

Highlights from Talks and Papers

Among the most memorable keynotes was Heng Ji’s “No more processing. Time to discover.” Speaking from the vantage point of drug discovery and the broader “AI for Science” movement, she challenged the community to re-center research around genuine breakthroughs rather than incremental improvements. Her call for models and methods that support true scientific discovery resonated strongly throughout the conference.

From an MT perspective, Lonyue Wang (Alibaba International) delivered a timely talk on the evolution of multilingual translation in the LLM era. He described the shift from fine-tuned LLMs to reasoning-driven models and now to LLM-based agents — while also highlighting the persistent gap between academic MT benchmarks and the needs of industrial-scale translation systems.

On the technical front, the paper S1: Simple Test-time Scaling drew considerable attention. It presents an open-source method for boosting model performance by increasing compute only at inference time — an ability previously demonstrated by proprietary systems such as OpenAI’s o1. With a modest finetuning set of just 1,000 questions on Qwen2.5-32B, the authors achieved competitive gains, opening the door to new experimentation strategies for many labs.

Here is a list of other research that stood out during the conference:

Conference Main Themes

Beyond efficiency and LLM-centric pipelines, several research threads recurred across workshops and oral sessions.
From the works that stood out from the sea of incremental research pieces, the common themes involved explorations inside LLM building blocks to hypothesize newer approaches in the architecture (e.g. a mixture of heterogenously-sized experts rather than homogenous ones in practice), and interpretations from LLM’s internal representations for their implications on surface-level output.

New Resources

For quality estimation work, several new models merit improved capabalities, including NVIDIA’s Qwen-MQM, Google’s MetricX-25 and GemSpanEval, and an RL-based MT evaluation model presented by researchers from the University of Amsterdam. Together, these resources represent some of the most relevant and forward-looking contributions to EMNLP and WMT 2025.


EACL 2026

EACL is one of the flagship conferences of the European Chapter of the Association for Computational Linguistics (ACL), bringing together researchers and practitioners working on all aspects of natural language processing. The conference is known for combining rigorous scientific contributions with a broad international perspective on language technologies, including machine translation, multilingual NLP, language resources, evaluation, ethics, and recent advances in large language models. Its relevance is especially strong in fostering collaboration across Europe and beyond while highlighting research with real-world impact. In 2026, EACL was held in Rabat, Morocco.

Emerging Trends

EACL 2026 featured a broad spectrum of topics, with several established fields continuing to have significant attention, such as Information Retrieval and Tokenization, alongside areas that appeared to be gaining notable momentum. Among these, for example:

  • Agents & Interaction: highlighting a shift from viewing LLMs as static generators toward autonomous systems capable of planning, acting, and executing tasks within complex environments. Current research focuses on embedding LLMs into agentic pipelines and tool-use frameworks to enable multi-step reasoning and multi-agent cooperation, a critical direction as the industry moves from content generation to functional task automation.
  • Safety & Alignment: addressing how to make models more robust against misuse, adversarial inputs, and misaligned behaviors. Given the continued deployment of LLMs in real-world applications, this line of research sits at the intersection of technical reliability and responsible AI.

Additionally, amplified by EACL’s first edition in Africa, multilingualism and low-resource support were at the forefront of the conference. The agenda highlighted this commitment through the inclusion of various affinity and language groups such as Arabic NLP, SomosNLP, and Muslims in ML. Two of the four keynote addresses were central to this narrative:

  • In Arabic and Technology: A 40-Year Perspective traced Arabic NLP, Nizar Habash revisited the development of Arabic NLP from early foundational efforts to today’s large-scale generative systems, highlighting the language’s unique challenges: rich morphology, ambiguous orthography, and its extensive dialectal variation. He concluded with a forward-looking vision for a cohesive, sustainable ecosystem, advancing Arabic in AI through stronger collaboration and innovation for the next generation of researchers.
  • In Omnilinguality: Scaling AI to Any Language (paper), Marta R. Costa-Juss presented a vision for extending AI capabilities across the world’s languages, framing “omnilinguality” not as a task-specific challenge, but as a broader arena for advancing general-purpose LLM techniques. While MT has historically driven multilingual progress, scaling to thousands of languages and producing key open resources like FLORES and BOUQuET, she argued that with the rise of LLMs, multilingual coverage should be addressed by design from the very beginning, particularly in early infrastructure choices, rather than treated as a downstream specialization.

This convergence of signals around multilingualism aligns with our long-term vision for Lara models.

Highlights from Talks and Papers

In addition to the previous two keynotes, here are some of the most impactful papers:

  • Humans and Transformer LMs: Abstraction drives language learning:
    Winner of the Best Paper Award, this research investigates whether transformer models acquire linguistic categories through abstract feature-based learning or concrete exemplar-based memorization. By tracking a GPT-2 model during training, the study reveals that abstract, class-level behaviors emerge before item-specific ones, demonstrating that language models possess an inductive bias toward forming abstractions.

  • DivMerge: A divergence-based model merging method for multi-tasking
    Winner of the Outstanding Paper Award, this research introduces DivMerge, a reference-free method that combines multiple specialized models into a single multi-task one. The method automatically balances task importance and mitigates interference without requiring explicit target distributions or labeled data, achieving this by minimizing the divergence between the outputs of the original fine-tuned models and the newly merged model. For our team, this presents a clear practical interest, as we could experiment to efficiently merge different translation models or combine models trained on entirely distinct tasks into one unified system, all while avoiding the high computational costs associated with standard multi-task retraining.

  • Detecting Hallucination in Vision Language Models without generating a single token
    This paper tackles hallucination in VLMs from an interesting angle: rather than detecting errors after generation, it asks whether a model’s internal states, captured in a single forward pass before any token is produced, can already predict whether a hallucination is likely. By training lightweight probes on these representations, the authors show that late query-token states tend to be the most predictive, though the optimal layer varies by architecture. For our work, the prospect of anticipating hallucinations before they occur opens up some genuinely useful directions.

  • Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
    This paper addresses a genuinely interesting challenge: how to extend reinforcement learning techniques, typically effective in structured, verifiable domains like mathematics, to broader reasoning tasks. The proposed pipeline covers the full process, from data curation and template design to control answer-space complexity, rule-based reward modeling, and data blending strategies. The core finding is that integrating multi-domain data meaningfully improves both reasoning accuracy and token efficiency across diverse benchmarks. The methodology may help improve Lara translation accuracy and contextual understanding.
  • Specialization through Collaboration: Understanding Expert Interaction in Mixture-of-Expert Large Language Models
    In this paper, the authors reframe MoE specialization from individual “subnetworks” to coordinated groups. Using Hierarchical Sparse Dictionary Learning (HSDL), they reveal that models specialize through cross-layer collaborations that map to distinct tasks. This enables a pruning algorithm, which leverages these patterns to prune up to 50% of experts with minimal performance loss, proving that maintaining collaborative structures is more effective than simply preserving high-frequency experts.

  • ReflectiveRAG: Rethinking Adaptivity in Retrieval-Augmented Generation
    The authors address a practical RAG failure mode: performance degradation under noisy retrieval. Rather than scaling model size, they propose two lightweight modules,  a Self-Reflective Retrieval (SRR) and a Contrastive Noise Removal (NR), to first iteratively evaluate evidence sufficiency and perform query reformulation, then refine retrieved content via embedding-based contrastive filtering. This approach offers a fresh perspective on the retrieval problem, injecting reasoning into the process rather than relying on scale. The adaptive query reformulation mechanism could be particularly relevant for improving Lara’s adaptation capabilities.

  • A Family of LLMs Liberated from Static Vocabularies(presented in the BoF Tokenization & Beyond)
    This paper from Aleph Alpha proposes replacing traditional tokenizers with a hierarchical byte-level architecture (HAT), where a small encoder aggregates bytes into standard word embeddings and a decoder converts outputs back to bytes. This approach makes the model inherently more robust to spelling variations and more adaptable to new languages and domains through continued training. Interestingly, the proposed techniques are compatible with pretrained models (hence “liberated”) and are able to match or outperform the original Llama 3.1 performance on most benchmarks..

Conference Main Themes

Beyond the prominent focus on multilinguality and under-resourced languages, the conference maintained a strong balance across the broader NLP landscape. Research contributions covered a wide range of topics, including inference optimization through techniques such as KV-cache compression and speculative decoding, bias mitigation and fairness, multimodal models, and data curation. This breadth highlighted the community’s continued focus on both advancing model capabilities and addressing practical challenges related to efficiency, reliability, and responsible deployment.