Physics Simulators Train LLMs to Solve Olympiad Problems Without Real-World Data

Researchers have demonstrated that physics simulators can effectively train language models to solve complex physics problems, achieving 5-10 percentage point improvements on International Physics Olympiad (IPhO) questions using only synthetic simulated data. The work, published by a nine-person team including Mihir Prabhudesai, Aryan Satpathy, and Deepak Pathak, shows zero-shot sim-to-real transfer where models trained exclusively on simulation perform better on real-world physics problems.

Synthetic Data Generation Addresses Physics Reasoning Bottleneck

The research addresses a fundamental limitation in training reasoning-capable models for scientific domains. While mathematics benefits from abundant internet question-answer pairs, physics lacks large-scale QA datasets. The researchers' solution generates random scenes in physics engines, creates synthetic question-answer pairs from simulated interactions, and trains LLMs using reinforcement learning on this synthetic data.

This approach demonstrates a potential path around the data wall problem for scientific domains where large QA datasets don't exist naturally. The ability to generate unlimited training scenarios through simulation provides scalable data generation beyond internet scraping limits.

Training Method Combines Simulation With Reinforcement Learning

The methodology involves four key steps:

Generate random scenes in physics simulators
Create synthetic QA pairs from simulated interactions
Train LLMs using reinforcement learning on synthetic data
Test on real-world physics problems

The results show training solely on synthetic simulated data improves performance on IPhO problems by 5-10 percentage points across model sizes. The models exhibit zero-shot sim-to-real transfer, meaning they were trained only on simulation but perform better on actual physics problems.

Code Released for Reproducibility

The research team has made their code publicly available at https://sim2reason.github.io/, enabling other researchers to build on their work. The demonstration of effective sim-to-real transfer suggests physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning, potentially applicable to other scientific domains facing similar data scarcity challenges.

Key Takeaways

Training LLMs on synthetic physics simulation data improves International Physics Olympiad problem performance by 5-10 percentage points
Models demonstrate zero-shot sim-to-real transfer, performing better on real physics problems despite training only on simulations
Physics simulators provide unlimited scalable training data for domains lacking large question-answer datasets
The approach addresses the data wall problem by generating synthetic training scenarios rather than relying on internet scraping
Research code is publicly available at https://sim2reason.github.io/

Synthetic Data Generation Addresses Physics Reasoning Bottleneck

Training Method Combines Simulation With Reinforcement Learning

The methodology involves four key steps:

Generate random scenes in physics simulators

Create synthetic QA pairs from simulated interactions

Train LLMs using reinforcement learning on synthetic data

Test on real-world physics problems

Code Released for Reproducibility

Key Takeaways

Training LLMs on synthetic physics simulation data improves International Physics Olympiad problem performance by 5-10 percentage points

Models demonstrate zero-shot sim-to-real transfer, performing better on real physics problems despite training only on simulations

Physics simulators provide unlimited scalable training data for domains lacking large question-answer datasets

The approach addresses the data wall problem by generating synthetic training scenarios rather than relying on internet scraping

Research code is publicly available at https://sim2reason.github.io/