Developer c1b has created an in-browser Proximal Policy Optimization (PPO) training demo that runs entirely in web browsers using WebGPU, demonstrating the potential for accessible AI training without local hardware or cloud infrastructure. The demo, available at ppo.gradexp.xyz, was posted to Hacker News on May 14, 2026, receiving 116 points and 29 comments.
Full Training Pipeline Runs in Browser
The platform enables users to start and manage complete reinforcement learning training sessions directly in a browser, monitoring key metrics including rollouts per second, policy and value losses, and episode progress. Users can compare multiple training runs with different hyperparameters, visualize trained agent behavior through policy rollouts, and configure preset learning rate sweeps or custom training configurations.
The demo displays standard PPO metrics: policy loss (pol), value loss (val), entropy (ent), gradient norm (gn), and KL divergence (kl).
tinygrad Compiles to WebGPU Kernels
The technical implementation leverages tinygrad, a neural network framework that can compile to WebGPU kernels for browser execution. According to the Show HN post, the system uses "TinyJit -> WebGPU kernels" to enable in-browser training. The demo is built with Gradient Explorer and requires WebGPU support in the browser.
A GitHub pull request (#7051) demonstrates tinygrad's capability of "turning the browser into a" compute device for WebGPU operations. The WebGPU specification reached general availability in 2024, and by 2026 has achieved widespread browser support.
Democratizing Agent Training Access
Running full reinforcement learning training workflows entirely in browsers—typically requiring significant computational resources—represents a meaningful accessibility achievement. The demo eliminates the need for local hardware or cloud infrastructure, democratizing access to agent training capabilities.
By 2026, browsers have evolved from document viewers into powerful platforms for AI computation, with sophisticated models running directly through WebGPU. This shift enables developers and researchers to experiment with training algorithms without infrastructure barriers.
Key Takeaways
- Developer c1b created an in-browser PPO training demo at ppo.gradexp.xyz that runs complete reinforcement learning training sessions using WebGPU
- The system uses tinygrad's TinyJit compiler to generate WebGPU kernels, enabling neural network training directly in browsers without local hardware or cloud infrastructure
- Users can monitor standard PPO metrics including policy loss, value loss, entropy, gradient norm, and KL divergence while comparing multiple training runs
- WebGPU reached general availability in 2024 and by 2026 has enabled browsers to become powerful AI computation platforms
- The demo represents a significant accessibility achievement, democratizing reinforcement learning experimentation by eliminating infrastructure requirements