StepFun 3.5 Flash Tops OpenClaw Cost-Effectiveness Benchmark Across 300 Battles

StepFun 3.5 Flash Ranks First in Cost-Effectiveness for Agentic Workloads

StepFun's Step 3.5 Flash has claimed the top position in cost-effectiveness for OpenClaw tasks based on 300 battles in the UniClaw Arena benchmark. The sparse Mixture of Experts (MoE) model activates only 11B of its 196B parameters per token, achieving competitive performance at dramatically lower costs than Western alternatives.

Pricing Undercuts Competitors by 10-18x

Step 3.5 Flash is priced at $0.10 per million input tokens and $0.30 per million output tokens. StepFun claims a 1.0x baseline decoding cost where competing models range from 1.2x to 18.9x for equivalent tasks. The model is available free on OpenRouter (step-3.5-flash:free), making it accessible for developers testing agentic applications.

Strong Performance on Agent and Web Navigation Benchmarks

The model demonstrates robust capabilities across multiple benchmarks:

Agent reliability benchmark: 88.2 score
BrowseComp with context manager: 69.0 score
PinchBench: 82.9% accuracy (19.06/23 tasks) in 1334 seconds
PinchBench optimized: 83.9% accuracy (19.29/23 tasks) in 1222 seconds costing $0.14

These metrics suggest strong web navigation and tool-use capabilities, critical for OpenClaw deployments where models must maintain deep reasoning and consistency during execution.

Open-Source Release Enables Agent Development

Released under Apache 2.0 license, Step 3.5 Flash is available for commercial use. Community developers are already integrating it with existing tools—one user shared a configuration for using Claude Code with Step 3.5 Flash through OpenRouter: 'ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" ANTHROPIC_AUTH_TOKEN="$OPEN_ROUTER_API_KEY" ANTHROPIC_MODEL="stepfun/step-3.5-flash:free" claude'.

Chinese AI Companies Compete on Agent Economics

This release represents Chinese AI companies competing directly on cost-effectiveness for agentic workloads, an area traditionally dominated by Western models like Claude and GPT-4. The MoE architecture enabling competitive performance at dramatically lower activation costs suggests a viable path for cost-sensitive agent deployments, particularly in scenarios requiring high throughput at scale.

Key Takeaways

Step 3.5 Flash ranked #1 in cost-effectiveness for OpenClaw tasks across 300 battles in the UniClaw Arena benchmark
The model uses sparse MoE architecture, activating only 11B of 196B parameters per token, priced at $0.10/$0.30 per million tokens
Achieved 88.2 on agent reliability benchmark and 69.0 on BrowseComp, demonstrating strong agentic capabilities
Released open-source under Apache 2.0 license with free tier available on OpenRouter
Represents Chinese AI companies competing on cost-effectiveness for agent workloads at 10-18x lower costs than competitors

StepFun 3.5 Flash Ranks First in Cost-Effectiveness for Agentic Workloads

Pricing Undercuts Competitors by 10-18x

Strong Performance on Agent and Web Navigation Benchmarks

The model demonstrates robust capabilities across multiple benchmarks:

Agent reliability benchmark: 88.2 score

BrowseComp with context manager: 69.0 score

PinchBench: 82.9% accuracy (19.06/23 tasks) in 1334 seconds

PinchBench optimized: 83.9% accuracy (19.29/23 tasks) in 1222 seconds costing $0.14

These metrics suggest strong web navigation and tool-use capabilities, critical for OpenClaw deployments where models must maintain deep reasoning and consistency during execution.

Open-Source Release Enables Agent Development

Chinese AI Companies Compete on Agent Economics

Key Takeaways

Step 3.5 Flash ranked #1 in cost-effectiveness for OpenClaw tasks across 300 battles in the UniClaw Arena benchmark

The model uses sparse MoE architecture, activating only 11B of 196B parameters per token, priced at $0.10/$0.30 per million tokens

Achieved 88.2 on agent reliability benchmark and 69.0 on BrowseComp, demonstrating strong agentic capabilities

Released open-source under Apache 2.0 license with free tier available on OpenRouter

Represents Chinese AI companies competing on cost-effectiveness for agent workloads at 10-18x lower costs than competitors