Researchers released TREX (Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration) on April 15, 2026, introducing the first automated research agent dedicated to LLM fine-tuning. The multi-agent framework autonomously handles the entire training lifecycle—from requirement analysis and literature review to data preparation, model training, and evaluation—matching or surpassing expert-designed pipelines in several real-world scenarios.
TREX Combines Two Core Agents to Automate Training Workflows
The system orchestrates collaboration between two specialized modules. The Researcher interprets requirement goals, conducts literature reviews, and formulates experimental plans. The Executor, a code agent integrated with GPU clusters, implements these plans by constructing datasets and performing model training and evaluation. This division of labor enables end-to-end automation of complex workflows that previously required expert ML knowledge.
Monte Carlo Tree Search Efficiently Explores Training Strategies
TREX formulates training optimization as a tree-based search problem, leveraging Monte Carlo Tree Search (MCTS) to efficiently explore the open-ended space of training strategies under constrained computational budgets. The multi-round experimental process is modeled as a search tree, enabling:
- Efficient planning of exploration paths
- Reuse of historical results to avoid redundant computation
- Distillation of high-level insights from iterative trials
- Systematic navigation of complex hyperparameter spaces
- Strategic allocation of limited GPU resources
This approach allows TREX to converge on optimal training strategies while making efficient use of computational budgets—a critical consideration for practical deployment. Recent research has demonstrated MCTS's effectiveness for automated machine learning and hyperparameter optimization tasks.
FT-Bench Provides Real-World Evaluation Across 10 Tasks
The researchers introduced FT-Bench, a dedicated benchmark for evaluating agent-driven LLM training systems. The benchmark comprises 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks in chemistry, mathematics, computer science, and specialized domains. Empirical results show TREX consistently improves model performance across these diverse tasks, in several cases matching or surpassing expert-designed pipelines.
TREX Demonstrates Viability of Fully Automated AI Training
The system represents a breakthrough in closing the loop: using AI agents to improve AI models. While LLMs have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows like LLM training has remained a significant challenge. TREX successfully navigates decision spaces that previously required expert intervention, demonstrating that:
- Complex multi-step scientific workflows can be automated end-to-end
- Agent-driven exploration can match human expert performance in hyperparameter selection
- Tree-based search enables efficient use of limited GPU budgets
- Historical result reuse accelerates convergence to optimal strategies
Broader Implications for AI Model Development
TREX reduces the need for expert ML knowledge in fine-tuning workflows, enabling broader access to customized model development. The system's ability to autonomously conduct literature reviews, formulate training strategies, and iterate on experimental results opens new possibilities for democratizing advanced model customization. As organizations increasingly require domain-specific AI models, automated training systems like TREX could accelerate deployment while reducing the specialized expertise required.
The full paper is available at arXiv:2604.14116.
Key Takeaways
- TREX is the first automated research agent for complete LLM fine-tuning, autonomously handling requirement analysis, literature review, data preparation, training, and evaluation
- The system uses Monte Carlo Tree Search to efficiently explore training strategies, reusing historical results to optimize GPU budget usage
- Two core modules—the Researcher and the Executor—collaborate to interpret goals and implement experimental plans on GPU clusters
- FT-Bench benchmark comprises 10 real-world tasks across chemistry, mathematics, computer science, and specialized domains
- TREX matches or surpasses expert-designed pipelines in several scenarios, demonstrating viability of fully automated AI training workflows