Researchers from Google, UIUC, and UW have introduced SkillOS, a reinforcement learning framework that enables AI agents to continuously improve by learning how to manage their own skill libraries. Published on arXiv on May 7, 2026, SkillOS addresses a fundamental limitation of current LLM-based agents: their tendency to remain one-off problem solvers that fail to learn from past interactions.
SkillOS Separates Skill Execution from Skill Curation
The framework pairs a frozen agent executor with a trainable skill curator. The executor retrieves and applies skills from an external SkillRepo, while the curator learns to update this repository based on accumulated experience. Skills are represented as Markdown files and managed through file I/O operations similar to an operating system—hence the name SkillOS. The curator decides which skills to add, update, or remove based on task performance patterns.
Training Uses Composite Rewards and Grouped Task Streams
SkillOS employs a novel training approach based on grouped task streams organized by skill-relevant dependencies. Earlier trajectories in a stream update the SkillRepo, while later related tasks evaluate these updates, providing learning signals for curation from indirect and delayed feedback. This composite reward structure enables the curator to learn which skill modifications actually improve performance on subsequent tasks.
The framework consistently outperformed memory-free and strong memory-based baselines across both multi-turn agentic tasks and single-turn reasoning tasks. Notably, the learned skill curator generalizes across different executor backbones and task domains without requiring retraining.
Skills Evolve Into Higher-Level Meta-Skills
One of the most significant findings is that skills naturally evolve over time into more richly structured Markdown files that encode higher-level meta-skills. This emergent behavior suggests that the curator learns not just to store individual solutions, but to abstract patterns and strategies from experience. The framework also produces more targeted skill use compared to baselines, indicating that the curator learns meaningful organization principles rather than simply accumulating examples.
Key Takeaways
- SkillOS pairs a frozen executor with a trainable curator that learns to manage an external skill repository
- Skills are represented as Markdown files and updated through file I/O operations
- Training uses grouped task streams where early tasks update skills and later tasks evaluate those updates
- The learned curator generalizes across different executor models and task domains
- Skills naturally evolve into higher-level meta-skills encoded in structured Markdown over time