Proximal Supervised Fine-Tuning (PSFT)

Overview

Proximal Supervised Fine-Tuning (PSFT) is an advanced training methodology designed to enhance the performance of supervised fine-tuning (SFT) models. It addresses several critical challenges in machine learning, including generalization, exploration, and overfitting. PSFT is particularly effective in improving model performance on both in-domain and out-of-domain tasks while maintaining the model's general capabilities.

Architecture

PSFT builds on established reinforcement learning techniques, specifically Trust-Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), translating them into the supervised learning context. It employs a clipped surrogate objective that enforces trust-region-like constraints, preventing excessive updates to the policy while ensuring stability during training.

Key Techniques

Clipped Surrogate Objective: Stabilizes policy updates and prevents entropy collapse.
Warm-Up Phase: Aligns the initial policy with the offline dataset, enhancing in-domain performance.
Trust-Region Clipping: Limits policy updates to a small KL-divergence neighborhood, ensuring steady performance improvements.

Goals

The primary objectives of PSFT include:

Enhancing generalization and exploration in SFT models.
Improving performance on target SFT tasks while preserving general capabilities.
Preventing entropy collapse and addressing overfitting issues.
Achieving superior performance on targeted math tasks compared to standard SFT.

Dataset Information

PSFT requires specific dataset forms for effective training:

Offline datasets such as DAPO-MATH-17k and UltraFeedback.
High-quality datasets like s1k and LIMO for initial training steps.

Outputs

The outputs of the PSFT model include:

Improved performance metrics on in-domain and out-of-domain benchmarks.
Enhanced reasoning capabilities and instruction compliance.
Robust performance across various evaluation benchmarks, including MT-Bench, AlpacaEval, and Arena-Hard.

Evaluation

Settings

PSFT is evaluated across multiple settings:

In-domain benchmarks
Out-of-domain benchmarks
Human alignment datasets

Metrics

Performance is assessed using various metrics, including:

AIME-24 average scores for in-domain performance.
GPQA average scores for out-of-domain performance.
Other benchmarks such as MATH-500 and OlympidBench.

Findings

PSFT achieves an AIME24 score of nearly 20 at step 1300.
Consistent performance improvements over standard SFT baselines, particularly in in-domain tasks.
Demonstrates robustness and strong generalization capabilities across different models.

Limitations and Open Questions

While PSFT shows significant advancements, it faces challenges such as degradation in performance on certain evaluation tasks (e.g., IFEval) when compared to SFT and SFT-KL. Further research is needed to explore these limitations and potential improvements.

Conclusion

Proximal Supervised Fine-Tuning represents a significant step forward in the training of supervised models, combining the strengths of reinforcement learning with supervised fine-tuning techniques. Its ability to improve generalization, exploration, and overall performance makes it a valuable approach in the field of machine learning.

Sources

https://arxiv.org/abs/2508.17784v1