UC Berkeley Researchers Teach Robots to Peel Vegetables With 90% Success Rate

Researchers at UC Berkeley have developed a two-stage learning framework that enables robots to peel vegetables with over 90% success rates using preference-based finetuning from just 50-200 demonstration trajectories. The approach combines force-aware imitation learning with human preference feedback to master contact-rich manipulation tasks where success criteria are subjective rather than binary.

Two-Stage Framework Addresses Implicit Success Criteria in Manipulation

Many essential manipulation tasks—including food preparation, surgery, and craftsmanship—remain challenging for robots because success is continuous and context-dependent rather than easily quantifiable. The Berkeley team's framework addresses this through complementary stages:

Stage 1: Robust initial policy

Force-aware data collection captures contact dynamics during peeling
Imitation learning from 50-200 trajectories establishes baseline competence
Enables generalization across object variations within and across categories

Stage 2: Preference-based refinement

Learned reward model combines quantitative metrics with qualitative human feedback
Aligns policy behavior with human notions of task quality
Improves performance by up to 40% over baseline imitation learning

This methodology bridges techniques from language model alignment (RLHF) with physical skill learning, demonstrating that preference-based approaches transfer effectively to robotic manipulation.

System Achieves Strong Generalization Across Produce Categories

The framework demonstrates robust performance across challenging test cases:

Over 90% success rates on cucumbers, apples, and potatoes
40% performance improvement through preference-based finetuning over imitation learning alone
Zero-shot generalization: Policies trained on one produce category maintain 90%+ success on unseen instances
Cross-category transfer: Strong performance on out-of-distribution produce from different categories

The research team includes Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, and Jitendra Malik from UC Berkeley.

Implications for Subjective Skill Learning in Robotics

The work demonstrates that robots can learn contact-rich skills with implicit success criteria from minimal demonstration data when the learning framework properly separates robust execution from quality refinement. The force-sensitive manipulation component handles the physical dynamics of peeling, while preference-based alignment captures the subjective aspects of task quality.

This approach could extend to other domains where "good enough" varies by context—surgical assistance, craftsmanship, personal care—where task quality exists on a continuum rather than as a binary outcome. The combination of force-aware control with preference learning provides a template for teaching robots skills that require both physical competence and aesthetic judgment.

Key Takeaways

UC Berkeley's framework achieves over 90% success rates on vegetable peeling using just 50-200 demonstration trajectories
Two-stage approach combines force-aware imitation learning with preference-based finetuning for continuous quality improvement
Preference-based refinement improves performance by up to 40% over baseline imitation learning policies
System demonstrates strong zero-shot generalization within and across produce categories
Methodology bridges RLHF techniques from language models to physical manipulation tasks with subjective success criteria

Two-Stage Framework Addresses Implicit Success Criteria in Manipulation

Stage 1: Robust initial policy

Force-aware data collection captures contact dynamics during peeling

Imitation learning from 50-200 trajectories establishes baseline competence

Enables generalization across object variations within and across categories

Stage 2: Preference-based refinement

Learned reward model combines quantitative metrics with qualitative human feedback

Aligns policy behavior with human notions of task quality

Improves performance by up to 40% over baseline imitation learning

This methodology bridges techniques from language model alignment (RLHF) with physical skill learning, demonstrating that preference-based approaches transfer effectively to robotic manipulation.

System Achieves Strong Generalization Across Produce Categories

The framework demonstrates robust performance across challenging test cases:

Over 90% success rates on cucumbers, apples, and potatoes

40% performance improvement through preference-based finetuning over imitation learning alone

Zero-shot generalization: Policies trained on one produce category maintain 90%+ success on unseen instances

Cross-category transfer: Strong performance on out-of-distribution produce from different categories

The research team includes Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, and Jitendra Malik from UC Berkeley.

Implications for Subjective Skill Learning in Robotics

Key Takeaways

UC Berkeley's framework achieves over 90% success rates on vegetable peeling using just 50-200 demonstration trajectories

Two-stage approach combines force-aware imitation learning with preference-based finetuning for continuous quality improvement

Preference-based refinement improves performance by up to 40% over baseline imitation learning policies

System demonstrates strong zero-shot generalization within and across produce categories

Methodology bridges RLHF techniques from language models to physical manipulation tasks with subjective success criteria