Skip to content

Contrastive Preference Optimization (CPO)

Overview

Contrastive Preference Optimization (CPO) is an innovative approach designed to enhance the performance of moderate-sized language models (LLMs) in machine translation tasks. By utilizing specially curated preference data, CPO addresses the limitations of traditional supervised fine-tuning (SFT) and direct preference optimization (DPO), enabling models to prioritize higher-quality translations while effectively rejecting inferior outputs.

Architecture

CPO operates by introducing a new training objective that moves beyond merely minimizing cross-entropy loss towards a reference. It employs a parameterized policy (π θ) and a reference model (π ref), which can be approximated as a uniform prior (U). The architecture emphasizes reference-free evaluation models, allowing for a more accurate assessment of translation quality without reliance on potentially flawed gold references.

Goals

The primary goals of CPO include:

  • Bridging the performance gap between moderate-sized LLMs (7B or 13B parameters) and state-of-the-art translation models.
  • Training models to discern and prioritize high-quality translations while avoiding suboptimal outputs.
  • Enhancing the overall performance of translation models while questioning the reliability of traditional evaluation methods.

Dataset Information

CPO requires specific datasets to function effectively:

  • Required Dataset Forms: 22K parallel sentences and triplet preference data.
  • Supported Dataset Types: Preference data from the FLORES-200 dataset.
  • Paired Preference Triplets: Constructed using FLORES-200 data, consisting of triplets (y_ref, y_gpt-4, y_alma) derived from 20K paired sentences across 10 translation directions.
  • Preference Data Acquisition: Utilizes specially curated preference data, including 1K internal human-labeled preference data and selections made by reference-free models.

Outputs

CPO aims to produce superior translations by training models to refine details and improve overall translation quality. The expected effects include:

  • Marked improvements in translation performance.
  • Enhanced capabilities of models like ALMA, bringing performance comparable to or surpassing GPT-4 and WMT competition winners.
  • Significant improvements across all translation directions.

Relationship to Other Methods

CPO builds upon existing methodologies, including:

  • Supervised Fine-Tuning (SFT): Acknowledges the limitations of SFT, which caps model performance at the quality level of the training data.
  • Direct Preference Optimization (DPO): Addresses the inefficiencies associated with DPO, such as memory and speed constraints.
  • CPO demonstrates significant performance improvements compared to both SFT and DPO, particularly in translation tasks.

Techniques and Modules

CPO incorporates several key techniques:

  • Contrastive Preference Optimization: Trains models to avoid generating adequate but imperfect translations, mitigating the shortcomings of SFT.
  • Manually Noised Data: Creates dis-preferred translations through random deletions and swaps, following the method suggested by Zeng et al. (2023).

Evaluation

CPO's effectiveness is evaluated using various settings and benchmarks:

  • Evaluation Settings: Includes WMT'21, WMT'22, and reference-free evaluation models such as KIWI-XXL and XCOMET.
  • Base Models Used: Evaluations involve models like ALMA, ALMA-13B-LoRA, and GPT-4.
  • Headline Results: CPO leads to significant performance enhancements, with ALMA-13B-R achieving high scores on KIWI-XXL and XCOMET.

Limitations and Open Questions

While CPO demonstrates substantial improvements, it also faces challenges, such as:

  • The inherent memory inefficiencies of DPO compared to SFT.
  • Potential performance drawbacks when not utilizing CPO, as models may slightly trail behind top competitors like GPT-4.

In summary, Contrastive Preference Optimization presents a robust framework for enhancing machine translation models, addressing key limitations of existing methods, and paving the way for future advancements in translation quality.

Sources

https://arxiv.org/abs/2401.08417v4