Understudy: Teach-by-Demonstration Desktop Agent Debuts on Hacker News

Developer bayes-song has released Understudy, a macOS desktop agent that learns tasks through teach-by-demonstration rather than explicit programming. Posted to Hacker News on March 12, 2026, the project received 82 points and 24 comments, introducing a novel approach where users perform tasks once and the agent extracts reusable skills from screen recordings and semantic events.

Understudy Learns Desktop Workflows by Watching Users Once

Understudy addresses a gap in current AI agents by operating across native desktop apps, browser tabs, terminals, and messaging tools in unified sessions. According to bayes-song, "I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces."

The teach-by-demonstration workflow operates in five steps: the user performs a task once while Understudy records screen video and semantic events, the agent extracts intent rather than pixel coordinates, and publishes a reusable skill with intent steps, route options, and GUI hints as fallback. On replay, the agent can prefer faster routes when available instead of mechanically reproducing every GUI action.

A demonstration video shows Understudy learning a workflow involving Google Image search, downloading a photo, removing the background in Pixelmator Pro, exporting the result, and sending via Telegram. When asked to repeat the task for a different search query, the agent adapts its approach rather than blindly replaying every action.

Progressive Learning Model Mimics New Hire Training

Understudy's architecture is designed as a layered progression mimicking how human employees learn:

Day 1: Watches how things are done through demonstration
Week 1: Imitates the process and asks clarifying questions
Month 6: Anticipates needs and acts proactively

Currently, layers 1-2 are working in production while layers 3-4 remain partial and early-stage. The system is local-first, requiring macOS for native GUI automation and teach-by-demonstration features. Installation is straightforward via npm: npm install -g @understudy-ai/understudy followed by understudy wizard.

Community Highlights Edge Case Handling and Robustness Concerns

The top Hacker News comment questioned fundamental assumptions: "Learning to do a thing means handling the edge cases, and you can't exactly do that in one pass." Users raised concerns about what happens when UI elements change, network connections fail, or unexpected dialogs appear during task execution.

bayes-song acknowledged current limitations, stating that "The system becomes slower when recovering from failures and doesn't guarantee successful completion in all scenarios." The developer emphasized that Understudy represents early-stage exploration of the teach-by-demonstration paradigm rather than production-ready automation.

Additional concerns focused on platform lock-in, as the macOS-only requirement limits broader adoption. Despite these limitations, commenters appreciated the novel approach to agent learning and the attempt to bridge multiple application surfaces in a single workflow.

Key Takeaways

Understudy is a macOS desktop agent that learns tasks through teach-by-demonstration by recording screen video and semantic events from a single user performance
The system extracts intent rather than coordinates, enabling agents to adapt workflows and prefer faster routes on replay instead of mechanically reproducing GUI actions
Architecture follows a progressive learning model mimicking new hire training, with layers 1-2 currently working and layers 3-4 in early development
Community concerns center on edge case handling, robustness when UI changes or failures occur, and recovery speed in unexpected scenarios
The project is macOS-only and acknowledged by the developer as early-stage exploration rather than production-ready automation

Understudy Learns Desktop Workflows by Watching Users Once

Progressive Learning Model Mimics New Hire Training

Understudy's architecture is designed as a layered progression mimicking how human employees learn:

Day 1: Watches how things are done through demonstration

Week 1: Imitates the process and asks clarifying questions

Month 6: Anticipates needs and acts proactively

Community Highlights Edge Case Handling and Robustness Concerns

Key Takeaways

Understudy is a macOS desktop agent that learns tasks through teach-by-demonstration by recording screen video and semantic events from a single user performance

The system extracts intent rather than coordinates, enabling agents to adapt workflows and prefer faster routes on replay instead of mechanically reproducing GUI actions

Architecture follows a progressive learning model mimicking new hire training, with layers 1-2 currently working and layers 3-4 in early development

Community concerns center on edge case handling, robustness when UI changes or failures occur, and recovery speed in unexpected scenarios

The project is macOS-only and acknowledged by the developer as early-stage exploration rather than production-ready automation