Translated's Research Center

Around the World in 80+ Languages

Localization

A global perspective on humans, artificial intelligence, and localization, co-authored by Translated’s VP of Artificial Intelligence Solutions, John Tinsley; VP of Social Impact, Alessandro Fusacchia; and Chief Operating Officer, Alessandro Cattelan, builds upon insights gleaned from the Imminent Unconferences.



Starting the journey

AI and new operating models are changing the world of localization, opening up immense opportunities but also posing new challenges. In a series of events, with a unique format, Translated has brought together different communities of experts from across the world to understand the consequences of this transformation. Many questions were raised, and some answers were found, and we report on them in this article.

At Translated, we believe in humans is more than just a tagline, it’s the philosophy upon which we base all of our activities. As an embodiment of this, Imminent, Translated’s Research Center, has held a number of “unconferences” across three continents – at which humans are the core.
The motivation for the chosen locations of the events is a unique one! Our boat – Translated9 – is participating in the Ocean Globe Race. One key requirement of the race is that the boats, and the technology used aboard, must be from 50 years ago to honour the first edition of the race in 1973. So once again, humans are at the core. The race has four legs, and we held an unconference at three of the docking points along the way, across three different continents. The first one was in London and Southampton in the UK; the second one in Cape Town, South Africa , and the final one in Punta del Este, Uruguay.

The general theme of these events was the symbiosis of humans, artificial intelligence, and language, though according to the unconference format (a sort of loosely structured conference), the attendees were able to suggest and ultimately vote on the specific topics for discussion. At the first event in London, the audience was made up mostly of European and North American-based language and technology experts, whereas in the latter two – in Africa and South America – the audience consisted of both international and regional experts, including language specialists, technologists, entrepreneurs, and government/administrative officials.
As a consequence, we saw some common topics emerge across all three events, as well as specific regional issues depending on the location. This gave proceedings a global feel, while retaining a very local flavour on each occasion. 


Common themes across the unconferences

The role of government in regulating AI

One of the topics that were brought up in all three unconferences was AI and the role of governments. Given that people are extremely concerned about the harm caused by people who misuse AI, governments are trying to respond to their citizen’s demands by updating and adapting existing laws and creating new ones to regulate the most pressing issues. However, it is important to have a balance between the restrictions imposed by governments to protect their citizens and the limitations that could curtail the progress of AI. In all cases, we believe that humans are and must remain in control of technological developments, and reject any prophecy claiming we will end up being governed by machines, unless we decide to do so!

Translation and education in the AI era

The translation profession, and how we educate the next generation of translators, should be evolving in the current era of AI, given the pace of change. Translators can benefit massively from this new technology that helps them manage bigger workloads, diversify their skill sets, and potentially make the profession more lucrative. However, one of the main concerns repeated across all three unconferences was how ill-equipped we are in terms of education to prepare this next generation of localizers. AI is conspicuous by its absence in university curricula, and educators appear not to be keeping up with this technology that is shifting the current role of translators. This is critical as translators are present at every stage of the translation process, but will most likely be in the latter stages of the process, working as proofreaders and editors of the content that may be produced or translated using AI. 


Translators can benefit massively from this new technology that helps them manage bigger workloads, diversify their skill sets, and potentially make the profession more lucrative.


Scholars either seem reluctant to make the necessary changes in the curricula or do not even think it is possible at the moment. Moreover, they probably do not know where to start. Therefore, translators are learning how to use these tools by themselves and at the same time training other translators and students so that they can leverage all the benefits AI offers and to prevent becoming obsolete in the ever-changing job market. Translation has always been linked to education, so it is crucial that these two make amends now. 

There is also an issue of cultural attitude whereby some translators still consider that technology cannot be an incredible asset for their job and ultimately for any culture-related activities. 


REGIONAL TOPICS

Where are we headed?

Living in the midst of such rapid change and evolution driven by AI, people naturally want to know what the impact is going to be. Consequently, a topic on everyone’s lips at the first Unconference in London was – the future! Of course, there is the big picture question of what the future holds for humanity, but more practically, what does the future hold for us as professionals in the localization industry? Where are we headed?
Today, we have a problem – localization is broken in the long-term. Content creation is disconnected from localization. Content creators leverage diverse contextual data for content production in a given language. This content is then effectively “thrown over the wall” to translators in localization who work with text segments individually, using their own personal perspectives to interpret context, which can be incomplete and may not reflect the original concept of the content creator

There may be measures to mitigate issues, such as style guides, and to address context-related errors, multiple revisions are usually conducted. However, in an increasingly fast-paced world, with high volumes of content and shorter time to market required, reduced review phases can lead to compromised translation quality. Consequently, localization can often be costlier, slower, and/or of lower quality than desired by users. All of this because the two processes are distinct – but what if they merged? What would this look like, and how would it even work? Enter, AI.

AI already has the ability to generate content multilingually, so rather than “create and translate” what if we simply created content in multiple languages simultaneously, with all the nuances of the target language and locale. Then, instead of a content creator for each language, AI creates the content, and the roles of the content creator and translator become one joint role of multilingual content reviewer.
For businesses, this would mean much more effective messaging, not just written in the language of the consumer, but targeted to the country, the region, or even the individual (more on this later!). For consumers, this means a much better lived experience, more relatable content that people truly identify with. 
But…we’re not quite there yet, we are still talking about the future after all. However, given the rapid advances we’ve seen with AI over just the past few years, this idea is no longer a pipe dream. It is very much a reality. A natively multilingual era!


For businesses, this would mean much more effective messaging, not just written in the language of the consumer, but targeted to the country, the region, or even the individual. For consumers, this means a much better lived experience, more relatable content that people truly identify with. 


AI for the rest of the world 

Another topic of wide reaching interest in London was so-called “AI for the next 4 billion people”. The idea was that while the 4 billion people in the “Global North” speak just a proportion of the world’s languages, this is where the vast majority of AI research, development, and investment is focused. But what about the other 4 billion people, and the hundreds if not thousands of languages they write and speak? Which of those languages should be prioritised for support and on what basis? How do we get started (or is it even our role)?  Who are the contact points in those regions- is it business leaders, academia, governments?

The fundamental idea was to apply the same blueprint that has worked to date for European and other languages – investment in data, hardware, research and development – but a big question mark was raised as to how applicable this approach would be. We learned more once we got our feet on the ground in Africa and South America.


T-index

T-index

Reach most of the online purchasing power

T-Index ranks countries according to their potential for online sales. It estimates the market share of each country in relation to global e-commerce.

Try it now

REGIONAL TOPICS

Understanding the breadth of the language challenge

There are so many indigenous languages in Africa, it’s quite a challenge to accurately quantify them. With estimates placing it anywhere between 1,000 and 2,000 languages, it means Africa is home to approximately one-third of all of the world’s languages. In South Africa alone, where the event was hosted, there are 11 official languages: Afrikaans, English, isiNdebele, isiXhosa, isiZulu, Sesotho, Setswana, siSwati, Tshivenda, Xitsonga, and Sepedi.

Even this is a dynamic situation and all languages are not created or exist equally. Languages are evolving and growing all the time, and even continue to be created. Take the unique example of Sheng. In the 1970s, a new medium of communication emerged in the slums of Nairobi, Kenya, called Sheng. Originally created as a secret language for Kenyan youth growing up in multicultural, multilingual urban environments, Sheng has become widely recognized as a linguistic phenomenon that goes beyond traditional slang.
Sheng has developed as a way for the Kenyan youth to emerge as a unique group within Kenyan society with new modes of interaction and socialisation that celebrate the fluidity of culture and identity. While Sheng is based on the grammatical structure of Kiswahili, its lexicon is adapted from the mainstream English, Kiswahili, and ethnic dialects, and surprisingly integrates Hindi, Spanish, and American slang as well.

Furthermore, some languages exist primarily in spoken form, e.g. Bangime in Mali, and Shabo in Ethiopia, meaning when we consider the question of AI, we need to think not only about solutions for text, but also for speech.

Taking the first steps

The big question in tackling the language challenge is – where do we start? Are the drivers for prioritising one language over another commercial, or should they be based on social impact?
An obvious starting point might be languages that have a wider footprint, i.e. the more speakers, the bigger the impact. However, a potential drawback of this approach is that it risks further marginalising languages spoken by smaller communities.  After that, there’s a question of who should take the lead, and whether it should be driven by technologists who have the experience, or by local leaders who understand the needs and specific challenges. 

One thing that became clear is that it needs to be a collaborative effort. Technologists from the Global North can offer expertise, infrastructure, and perhaps funding. Leaders in the region bring clarity on the needs (and the critical role of a bottom-up approach and more widely a deeply rooted connection with the communities), access to people, and crucially the ability to gather data.

Key actions to get moving and gain momentum include networking, connecting people, and building community. This serves to help guide the development of the technology, but also to build trust with indigenous stakeholders who may be wary of outside influence, of potential exploitation related to data resources, and help demonstrate the potential value in the technology, whatever the proposed scenarios.

Not a level playing field

Before we get ahead of ourselves, we need to revisit our blueprint and consider what we’re actually proposing. When it comes to technology and AI, today we’re talking about big data and Large Language Models – the emphasis on large. Consider what this requires:

  • Potentially vast amounts of data, in the order of terabytes – does this even exist or can it be created?
  • Massive computing infrastructure to train models – this is probably achievable with the support of large technology providers.
  • Huge computing infrastructure to operate the models – this is much more of a practical challenge to do locally. And if we’re considering cloud solutions…
  • High speed, low latency internet connections are needed, and certainly these are far from guaranteed.

One clear example of this that became apparent in discussions in South Africa, is the need for AI and technology support for indigenous communities, in the area of education around healthcare, and bringing valuable information to people in a medium they can consume, be it written or audio. The challenge that exists around this, for example, lies in the fact that if there’s an AI app for sharing such information, it may require internet connectivity. If that’s not available, the AI models may need to be stored on the device. But in many cases, smart devices are not prevalent and storage capacity is limited.


The need for AI and technology support for indigenous communities, in the area of education around healthcare, and bringing valuable information to people in a medium they can consume, be it written or audio.


Therefore, we cannot necessarily take today’s approach of building large AI applications and finding the best use cases after the fact. We have limited resources, so we need to be more targeted in the applications that we build and more creative in how we design them. For example, in the past when building large machine translation models, we also had limited resources – less storage on our devices, no 4G/5G connectivity. Thus, we followed lines of research to “distil” the models so that they could fit on a device, or work offline. When Moore’s Law kicked in, with effectively unlimited resources in the cloud and high-speed connectivity, most of these techniques were dropped. 

Perhaps we need to dust some of them off again, in the interest of making AI more accessible globally. Perhaps the era of the small language model is on the horizon.


REGIONAL TOPICS

At the unconference in Uruguay, there was also extensive discussion about the unique language challenges in South (and Central) America. However, there was a twist compared to what was discussed in Africa. The challenge presented wasn’t so much one of the variety of different languages, but rather variations on the same language across countries and regions.
There are of course many languages in South America, with Spanish, Portuguese, Quechua, Guaraní, and Aymara being the top five most spoken languages, amongst a range of other European and indigenous languages. Spanish is the most widely spoken language on the continent, but has the unique characteristic that it is spoken across 19 different countries, all of which share borders. In fact, if you drew a continuous line from Tijuana in the northwest of Mexico, to Punta Arenas in the south of Chile (over 14,000km) every single country it passed through would be a Spanish-speaking country.

As you can imagine, along with this phenomenon comes a lot of linguistic diversity in a single language, which poses challenges with localization and companies that want to do business in South America. Is the solution…LatAm Spanish?

What is “LatAm Spanish”

When localising into Spanish for South America, one option is to consider all variants. However, this has cost implications and, because the rest of the world often looks at South America as a single linguistic block, the concept of a homogenous or neutral Spanish – so-called LatAm Spanish – became a reality. As a consequence, the rich linguistic diversity of Spanish in different countries gets lost, since localization into every Spanish variant is not a current priority for all businesses. 


Spanish is the most widely spoken language on the continent, but has the unique characteristic that it is spoken across 19 different countries, all of which share borders. In fact, if you drew a continuous line from Tijuana in the northwest of Mexico, to Punta Arenas in the south of Chile (over 14,000km) every single country it passed through would be a Spanish-speaking country!


Whether or not this was necessarily a bad thing was the topic of some debate at the unconference. Certainly, there was a universally held view that LatAm Spanish didn’t effectively represent anyone. No one identified with it as “their” Spanish and it is effectively perceived as a generic form of Spanish – good for understanding, but not so good at creating a sense of belonging. Some attendees were not bothered by this because they have become used to it and they still generally understand everything. While others felt their locale(s), and as a consequence, their identities were being lost, and that there should be a bigger effort to localise in more countries. 
It was accepted that the reality is that the decision is ultimately a commercial one, and that if a company is going to localise more, it may just be into a select few locales, e.g. Colombia, Mexico, and Argentina, but nothing else, based on their business needs.

However, there were numerous examples given where using LatAm Spanish would cause confusion because fundamentally some terms are unique and exclusive to some countries. For example, if a clothing brand wanted to choose a generic term for “t-shirt”, this would be a challenge because it is variously translated as remera, chomba, pulóver, camiseta and playera. Moreover, each of these terms can refer to a different type of t-shirt; Argentinians call camiseta a long sleeve t-shirt, and chomba at-shirt with a polo neck, whereas Uruguayans use camiseta and remera interchangeably. Cubans use pulóver for those terms, however in most countries pulóver refers to a “sweater.” If these simple terms can create some confusion, you can imagine how this can affect a brand that relies on user experience.


Symbiotic Connections

Symbiotic Connections

Imminent’s Annual Report 2024

A journey through neuroscience, localization, technology, language, and research. An essential resource for leaders and a powerful tool for going deeper in knowing and understanding the perceived trade-off between artificial intelligence and humans and on their respective role in designing socio-technical systems.

Secure your copy now!

AI and Hyperlocalisation

Going into this level of detail for a locale is known as hyperlocalisation – adapting content down to the level of the region and even lower. Typically, it requires strong commercial data to justify the investment, but evidence to date suggests that AI could offer a really viable solution to this challenge.

Today’s models have shown a very strong ability to learn differences between things – in this case, locales – through inference. Thus means we could have a process whereby content is initially localised into a generic form of Spanish using processes already in place today, and an AI model could subsequently be used to hyperlocalise into all locales, almost instantly, at a cost point that’s almost negligible. 

As a cautionary tale, however, one curious anecdote was shared about potential pitfalls of hyperlocalisation. The 2004 film The Incredibles was localised/dubbed into three versions of Spanish: neutral, Mexican, and Rioplatense, the dialect of Spanish in Argentina and Uruguay. For the latter, while the expectations were that this would be the start of a huge market for dubbing in Argentine market, the resulting film was widely rejected for being too narrowly focused, often using terms specifically from Buenos Aires, for using local place names when parts of the film were set in well known locations in the USA. Reviews suggested that a more neutral version of Spanish would have been more palatable. Go figure!

This concept of hyperlocalisation with AI also has the potential to be applied to indigenous languages not only across South America, but also in Africa, and other regions across the world. While the lack of data for many of these languages is still a hurdle to overcome, the fact that models have shown the ability to make inferences between languages, with less data than is needed to train the large foundational models, leaves the door open for potentially huge impact. This is a very exciting vision for the near future!


John Tinsley

John Tinsley

Entrepreneur | Technologist | Business Leader | Public Speaker | Marketing | Growth | and all things (Machine) Translation and AI

John Tinsley is an Irish entrepreneur, computer scientist, and translation expert. He founded Iconic Translation Machines, an award-winning language technology software business which pioneered the commercial deployment of Neural Machine Translation technology. John grew the business for almost a decade before selling it to RWS in 2020 in one of the largest technology deals in the language industry. He is now the VP for AI Solutions at Translated. He holds a PhD in Machine Translation and a degree in Applied Computational Linguistics, and is a regular public speaker on topics related to language, translation, and business.

Alessandro Fusacchia

Alessandro Fusacchia

Vice President of Social Impact at Translated

In the last 10 years he has worked for three Ministries and has been a member of Parliament, dealing with startups, education, culture, and AI.

Alessandro Cattelan

Alessandro Cattelan

Translated COO

Alessandro Cattelan is an experienced manager in the translation industry with a strong focus on technology, automation and process optimisation. He has been leading operations at Translated since 2012, helping to grow the company by scaling the team, improving management practices, and designing and managing some of the company's key products. He graduated summa cum laude in translation studies at the University of Trieste (Italy).