StepFun 3.5 Flash Ranks First in Cost-Effectiveness for Agentic Workloads
StepFun's Step 3.5 Flash has claimed the top position in cost-effectiveness for OpenClaw tasks based on 300 battles in the UniClaw Arena benchmark. The sparse Mixture of Experts (MoE) model activates only 11B of its 196B parameters per token, achieving competitive performance at dramatically lower costs than Western alternatives.
Pricing Undercuts Competitors by 10-18x
Step 3.5 Flash is priced at $0.10 per million input tokens and $0.30 per million output tokens. StepFun claims a 1.0x baseline decoding cost where competing models range from 1.2x to 18.9x for equivalent tasks. The model is available free on OpenRouter (step-3.5-flash:free), making it accessible for developers testing agentic applications.
Strong Performance on Agent and Web Navigation Benchmarks
The model demonstrates robust capabilities across multiple benchmarks:
- Agent reliability benchmark: 88.2 score
- BrowseComp with context manager: 69.0 score
- PinchBench: 82.9% accuracy (19.06/23 tasks) in 1334 seconds
- PinchBench optimized: 83.9% accuracy (19.29/23 tasks) in 1222 seconds costing $0.14
These metrics suggest strong web navigation and tool-use capabilities, critical for OpenClaw deployments where models must maintain deep reasoning and consistency during execution.
Open-Source Release Enables Agent Development
Released under Apache 2.0 license, Step 3.5 Flash is available for commercial use. Community developers are already integrating it with existing tools—one user shared a configuration for using Claude Code with Step 3.5 Flash through OpenRouter: 'ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_AUTH_TOKEN="$OPEN_ROUTER_API_KEY" ANTHROPIC_MODEL="stepfun/step-3.5-flash:free" claude'.
Chinese AI Companies Compete on Agent Economics
This release represents Chinese AI companies competing directly on cost-effectiveness for agentic workloads, an area traditionally dominated by Western models like Claude and GPT-4. The MoE architecture enabling competitive performance at dramatically lower activation costs suggests a viable path for cost-sensitive agent deployments, particularly in scenarios requiring high throughput at scale.
Key Takeaways
- Step 3.5 Flash ranked #1 in cost-effectiveness for OpenClaw tasks across 300 battles in the UniClaw Arena benchmark
- The model uses sparse MoE architecture, activating only 11B of 196B parameters per token, priced at $0.10/$0.30 per million tokens
- Achieved 88.2 on agent reliability benchmark and 69.0 on BrowseComp, demonstrating strong agentic capabilities
- Released open-source under Apache 2.0 license with free tier available on OpenRouter
- Represents Chinese AI companies competing on cost-effectiveness for agent workloads at 10-18x lower costs than competitors