All about technology. — All about artificial intelligence.

Discover ALMA: A Fresh Training Approach Amplifying Translation Capabilities in Advanced Language Models

Researchers at Johns Hopkins and Microsoft suggest a two-stage fine-tuning approach, enhancing smaller models with 7-13 billion parameters, showing improved translation capabilities.

, and Administrator

2025 July 9 . 5:02 AM

2 min read

Discover ALMA: A Novel Training Technique Enhancing Translation Efficiency in Large Language Models

Discover ALMA: A Fresh Training Approach Amplifying Translation Capabilities in Advanced Language Models

In a groundbreaking development, researchers from Johns Hopkins University and Microsoft have proposed a novel 2-stage fine-tuning method aimed at enhancing the translation capabilities of smaller Language Models (LLMs). This innovative approach allows these models to rival or even surpass the performance of larger models like GPT-3.

## The 2-Stage Fine-Tuning Method

The proposed method consists of two stages:

1. **Stage 1: Monolingual Fine-Tuning** During this stage, the LLM is trained on a large-scale monolingual corpus, such as Wikipedia or Common Crawl, to improve its general knowledge and language understanding. This continuous pretraining ensures the model remains adaptable and capable of handling a wide range of tasks, including translation.

2. **Stage 2: High-Quality Parallel Fine-Tuning** In this stage, the model is fine-tuned on a smaller amount of parallel data, such as the training data from the WMT dataset. This fine-tuning allows the model to specialize in translation tasks by adjusting its parameters based on the labeled data, enhancing accuracy and fluency in translations.

## Strengthening Translation Capabilities

The researchers emphasize the importance of carefully curating and generating data for translation tasks to strengthen the model's performance. They also suggest that optimizing layer-wise modifications and perplexity optimization can lead to better translation quality.

## Achieving Performance Comparable to GPT-3

By employing strategies like preference optimization, reinforcement learning with verifiable rewards, and tailoring the fine-tuning process based on the specific model architecture and task requirements, smaller LLMs can achieve performance comparable to GPT-3.

This new training paradigm eliminates the need for massive parallel text data in translation models, making it a significant step forward in the field of machine translation.

## The Impact

The improvements shine through across high and low resource languages. For instance, with just 1 billion monolingual tokens and 18 hours of training, ALMA, one of the models based on the LLaMA architecture, reaches performance on par with the 54B parameter NLLB model.

Modern language models like XGLM-7B, OPT-7B, and BLOOM-7B, despite having similar parameters, fall 15-30 BLEU points behind state-of-the-art models in machine translation on benchmark datasets like WMT and Flores-101. However, ALMA slightly exceeds GPT-3 and NLLB despite having far fewer parameters.

The massive 175B parameter GPT-3 can rival state-of-the-art translation quality, while the 7B GPT-3 trails by over 30 BLEU points in machine translation. ALMA substantially improves over LLaMA's zero-shot translation by over 12 BLEU and COMET, demonstrating the effectiveness of the proposed method.

This work represents a significant leap forward in unlocking the translation potential in smaller LLMs, potentially making machine translation more accessible and efficient for a wider range of applications.

Artificial intelligence, in the form of the 2-stage fine-tuning method, has been applied to enhance the translation capabilities of smaller Language Models (LLMs). During the second stage of this method, the models are fine-tuned on parallel data, allowing them to specialize in translation tasks and even match or surpass the performance of larger models like GPT-3, thereby demonstrating the power of technology in the realm of machine translation.

Latest

Fifteen percent of the fleet are now electric vehicles, according to MEG.

All about technology.

Fifteen percent of the vessel fleet now runs on electric power

MEG's electric vehicle collection expands: Originally acquired one electric passenger vehicle five years ago, the company now owns a total of 20 such vehicles.

, and Administrator

2025 August 22

Goldman Sachs Asset Management is hiring our site's lead for retail and wholesale sales.

All about technology.

Goldman Sachs Asset Management is hiring our site's leader for retail and wholesale sales operations.

Goldman Sachs Asset Management has appointed Oliver Rahe to manage their retail and wholesale sales operations. He is now responsible to Dennis Luebcke, the head of client business in Germany and Austria, for these responsibilities.

, and Administrator

2025 August 22

Blazing-fast Gigabit Internet offered by Vodafone: constructing data superhighways for roughly 900...

All about technology.

Superfast Gigabit Internet by Vodafone: Over 900 households in Biberach district to benefit from new data expressways

High-speed Gigabit Internet now available from Vodafone for approximately 900 homes in...

, and Administrator

2025 August 22

Ford launches innovative electric vehicle architecture and manufacturing process

All about technology.

Ford presents novel electric vehicle foundation and manufacturing methodology

Ford unveils blueprint for budget-friendly electric vehicles, with the initial model hitting the market in 2027.

, and Administrator

2025 August 22

Discover ALMA: A Fresh Training Approach Amplifying Translation Capabilities in Advanced Language Models

Discover ALMA: A Fresh Training Approach Amplifying Translation Capabilities in Advanced Language Models

Read also:

Related

Latest