Energy-Based Fine-Tuning Matches RLVR Performance While Achieving Lower Validation Loss

Sunday, March 15, 2026