TREX Multi-Agent System Automates Complete LLM Fine-Tuning Lifecycle

Researchers released TREX (Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration) on April 15, 2026, introducing the first automated research agent dedicated to LLM fine-tuning. The multi-agent framework autonomously handles the entire training lifecycle—from requirement analysis and literature review to data preparation, model training, and evaluation—matching or surpassing expert-designed pipelines in several real-world scenarios.

TREX Combines Two Core Agents to Automate Training Workflows

The system orchestrates collaboration between two specialized modules. The Researcher interprets requirement goals, conducts literature reviews, and formulates experimental plans. The Executor, a code agent integrated with GPU clusters, implements these plans by constructing datasets and performing model training and evaluation. This division of labor enables end-to-end automation of complex workflows that previously required expert ML knowledge.

Monte Carlo Tree Search Efficiently Explores Training Strategies

TREX formulates training optimization as a tree-based search problem, leveraging Monte Carlo Tree Search (MCTS) to efficiently explore the open-ended space of training strategies under constrained computational budgets. The multi-round experimental process is modeled as a search tree, enabling:

Efficient planning of exploration paths
Reuse of historical results to avoid redundant computation
Distillation of high-level insights from iterative trials
Systematic navigation of complex hyperparameter spaces
Strategic allocation of limited GPU resources

This approach allows TREX to converge on optimal training strategies while making efficient use of computational budgets—a critical consideration for practical deployment. Recent research has demonstrated MCTS's effectiveness for automated machine learning and hyperparameter optimization tasks.

FT-Bench Provides Real-World Evaluation Across 10 Tasks

The researchers introduced FT-Bench, a dedicated benchmark for evaluating agent-driven LLM training systems. The benchmark comprises 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks in chemistry, mathematics, computer science, and specialized domains. Empirical results show TREX consistently improves model performance across these diverse tasks, in several cases matching or surpassing expert-designed pipelines.

TREX Demonstrates Viability of Fully Automated AI Training

The system represents a breakthrough in closing the loop: using AI agents to improve AI models. While LLMs have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows like LLM training has remained a significant challenge. TREX successfully navigates decision spaces that previously required expert intervention, demonstrating that:

Complex multi-step scientific workflows can be automated end-to-end
Agent-driven exploration can match human expert performance in hyperparameter selection
Tree-based search enables efficient use of limited GPU budgets
Historical result reuse accelerates convergence to optimal strategies

Broader Implications for AI Model Development

TREX reduces the need for expert ML knowledge in fine-tuning workflows, enabling broader access to customized model development. The system's ability to autonomously conduct literature reviews, formulate training strategies, and iterate on experimental results opens new possibilities for democratizing advanced model customization. As organizations increasingly require domain-specific AI models, automated training systems like TREX could accelerate deployment while reducing the specialized expertise required.

The full paper is available at arXiv:2604.14116.

Key Takeaways

TREX is the first automated research agent for complete LLM fine-tuning, autonomously handling requirement analysis, literature review, data preparation, training, and evaluation
The system uses Monte Carlo Tree Search to efficiently explore training strategies, reusing historical results to optimize GPU budget usage
Two core modules—the Researcher and the Executor—collaborate to interpret goals and implement experimental plans on GPU clusters
FT-Bench benchmark comprises 10 real-world tasks across chemistry, mathematics, computer science, and specialized domains
TREX matches or surpasses expert-designed pipelines in several scenarios, demonstrating viability of fully automated AI training workflows

TREX Combines Two Core Agents to Automate Training Workflows

Monte Carlo Tree Search Efficiently Explores Training Strategies

Efficient planning of exploration paths

Reuse of historical results to avoid redundant computation

Distillation of high-level insights from iterative trials

Systematic navigation of complex hyperparameter spaces

Strategic allocation of limited GPU resources

FT-Bench Provides Real-World Evaluation Across 10 Tasks

TREX Demonstrates Viability of Fully Automated AI Training

Complex multi-step scientific workflows can be automated end-to-end

Agent-driven exploration can match human expert performance in hyperparameter selection

Tree-based search enables efficient use of limited GPU budgets

Historical result reuse accelerates convergence to optimal strategies

Broader Implications for AI Model Development

The full paper is available at arXiv:2604.14116.

Key Takeaways

TREX is the first automated research agent for complete LLM fine-tuning, autonomously handling requirement analysis, literature review, data preparation, training, and evaluation

The system uses Monte Carlo Tree Search to efficiently explore training strategies, reusing historical results to optimize GPU budget usage

Two core modules—the Researcher and the Executor—collaborate to interpret goals and implement experimental plans on GPU clusters

FT-Bench benchmark comprises 10 real-world tasks across chemistry, mathematics, computer science, and specialized domains

TREX matches or surpasses expert-designed pipelines in several scenarios, demonstrating viability of fully automated AI training workflows