Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization Paper • 2510.05342 • Published Oct 6, 2025 • 6