Researchers at UC Berkeley have developed a two-stage learning framework that enables robots to peel vegetables with over 90% success rates using preference-based finetuning from just 50-200 demonstration trajectories. The approach combines force-aware imitation learning with human preference feedback to master contact-rich manipulation tasks where success criteria are subjective rather than binary.
Two-Stage Framework Addresses Implicit Success Criteria in Manipulation
Many essential manipulation tasks—including food preparation, surgery, and craftsmanship—remain challenging for robots because success is continuous and context-dependent rather than easily quantifiable. The Berkeley team's framework addresses this through complementary stages:
Stage 1: Robust initial policy
- Force-aware data collection captures contact dynamics during peeling
- Imitation learning from 50-200 trajectories establishes baseline competence
- Enables generalization across object variations within and across categories
Stage 2: Preference-based refinement
- Learned reward model combines quantitative metrics with qualitative human feedback
- Aligns policy behavior with human notions of task quality
- Improves performance by up to 40% over baseline imitation learning
This methodology bridges techniques from language model alignment (RLHF) with physical skill learning, demonstrating that preference-based approaches transfer effectively to robotic manipulation.
System Achieves Strong Generalization Across Produce Categories
The framework demonstrates robust performance across challenging test cases:
- Over 90% success rates on cucumbers, apples, and potatoes
- 40% performance improvement through preference-based finetuning over imitation learning alone
- Zero-shot generalization: Policies trained on one produce category maintain 90%+ success on unseen instances
- Cross-category transfer: Strong performance on out-of-distribution produce from different categories
The research team includes Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, and Jitendra Malik from UC Berkeley.
Implications for Subjective Skill Learning in Robotics
The work demonstrates that robots can learn contact-rich skills with implicit success criteria from minimal demonstration data when the learning framework properly separates robust execution from quality refinement. The force-sensitive manipulation component handles the physical dynamics of peeling, while preference-based alignment captures the subjective aspects of task quality.
This approach could extend to other domains where "good enough" varies by context—surgical assistance, craftsmanship, personal care—where task quality exists on a continuum rather than as a binary outcome. The combination of force-aware control with preference learning provides a template for teaching robots skills that require both physical competence and aesthetic judgment.
Key Takeaways
- UC Berkeley's framework achieves over 90% success rates on vegetable peeling using just 50-200 demonstration trajectories
- Two-stage approach combines force-aware imitation learning with preference-based finetuning for continuous quality improvement
- Preference-based refinement improves performance by up to 40% over baseline imitation learning policies
- System demonstrates strong zero-shot generalization within and across produce categories
- Methodology bridges RLHF techniques from language models to physical manipulation tasks with subjective success criteria