There are 100s of LLM based applications that serve English market but very few exist in a localized international market. Large Language Models (LLMs) are primarily trained on datasets predominantly in English, resulting in subpar performance in languages that are less represented, including several South East Asian languages such as Bahasa Indonesia, Thai, and Vietnamese. This inadequacy is evident through issues like non-compliance with user instructions, generation of irrelevant or fabricated content, and premature terminations of responses. In the realm of language model research, particularly within the open-source community, there is a growing interest in language-specific fine-tuning and the creation of distinct LLMs for each language. However, these solutions entail significant financial burdens, often requiring expenditures in the millions of dollars. While there are 100s of LLM based applications, there are not many that are localized. RAG systems which are text book examples of LLM applications have a exacerbated challenge where the retrieval phase largely depends on popular open-source embedding models biased towards English. To overcome this and enhance the cost-effective deployment of LLMs and RAG systems for non-English languages, particularly in the South East Asian context, we propose the implementation of a translation layer on top of the existing RAG framework. This layer would act as an intermediary, facilitating the adaptation of LLMs and RAG systems for local, non-English uses, thereby addressing the language disparity in a financially feasible manner.
Key Takeaways
- Localize for South East Asia : By integrating a translation layer into Retrieval Augmented Generation (RAG) systems, LLMs can be effectively utilized in multiple South East Asian languages, such as Tagalog, Malay, and Burmese, beyond the predominant English usage.
- Near Zero cost: This proposed methodology provides an economically viable solution to language barriers, making it more accessible to developers and researchers across South East Asia, where resources for language technology development may be limited.
- Faster Mass adoption: Implementing this strategy would accelerate the application developers who develop LLM based applications in non-English contexts within South East Asia, thereby opening up the technology to a more diverse and inclusive user base in the region.
—————————————————————————————————————————
Bio
Raghavan Muthuregunathan | Senior Engineering Manager, Search Artificial Intelligence | Linkedin| USA
Raghavan Muthuregunathan is the leader of Search Artificial Intelligence organization at LinkedIn, he has been instrumental in integrating AI into the core functionalities of the platform, significantly enhancing user experience and business productivity. His most recent accomplishment contributing to the development of LinkedIn’s premium AI experience, in enhancing user interactions and business outcomes.
Raghavan is deeply committed to the open-source community, contributing to projects with the Linux Foundation’s AI organization and Apache Solr. His efforts here reflect his belief in the power of collaborative innovation and the importance of making AI tools accessible and beneficial for a wider audience.

Stage M2_2024
Raghavan Muthuregunathan | Senior Engineering Manager Search Artificial Intelligence | Linkedin| USA