DeepSeek V4 Delivers Near-Frontier Performance at 2% of Competitor Pricing

DeepSeek announced V4 Preview on April 24, 2026, releasing two open-weights models that deliver performance within 3-6 months of state-of-the-art at a fraction of competitor pricing. The announcement introduces DeepSeek-V4-Pro with 1.6 trillion total parameters and DeepSeek-V4-Flash with 284 billion parameters, both licensed under MIT and available on Hugging Face. Pricing starts at $0.14 per million input tokens for V4-Flash, representing a 91% cost reduction compared to GPT-5.4 and a 77% reduction for output tokens.

Two Model Variants Target Different Use Cases

DeepSeek-V4-Pro features 1.6 trillion total parameters with 49 billion active parameters, a 1 million token context window, and weights available as an 865GB download on Hugging Face. The model is priced at $1.74 per million input tokens and $3.48 per million output tokens through DeepSeek's official API. DeepSeek-V4-Flash offers 284 billion total parameters with 13 billion active parameters, maintaining the same 1 million token context window with a 160GB model size. Flash pricing sits at $0.14 per million input tokens and $0.28 per million output tokens. A 75% promotional discount applies to both models through May 2026.

Dramatic Efficiency Improvements Over Previous Generation

DeepSeek achieved significant computational efficiency gains compared to V3.2. V4-Flash requires only 10% of the single-token FLOPs and 7% of the KV cache size compared to V3.2 in 1 million token context settings. V4-Pro demonstrates similar improvements, requiring only 27% of computational operations for inference with KV cache requirements reduced to 10% of the previous generation.

Performance Approaches Frontier Models at Fraction of Cost

According to the DeepSeek paper, V4-Pro performs within 3 to 6 months of state-of-the-art models, falling marginally short of GPT-5.4 and Gemini-3.1-Pro on reasoning benchmarks. The model represents the largest open-weights model available, exceeding Kimi K2.6 at 1.1 trillion parameters and GLM-5.1 at 754 billion parameters. Developer reception on Hacker News highlighted dramatic cost savings, with one developer reporting spending $0.09 on tasks that previously cost $9-$13 with Claude Opus. Another developer running extensive codebase analysis spent $0.95 versus an estimated $10+ for Opus 4.7.

Developer Adoption and Concerns

Developers report that DeepSeek V4 delivers Sonnet-level coding capability with improved compliance, with users noting the model "just does what I ask" compared to refusals from other providers. However, some concerns emerged around token efficiency, with reports of unusually high reasoning token consumption in edge cases. Data privacy questions about DeepSeek's official API persist, though Azure and third-party providers offer alternatives for users with compliance requirements. Simon Willison called the release "almost on the frontier, a fraction of the price" and highlighted how dramatic cost savings enable entirely new use cases.

Key Takeaways

DeepSeek V4-Pro and V4-Flash launched April 24, 2026 with MIT licenses, 1 million token context windows, and pricing starting at $0.14 per million input tokens
V4-Pro achieves 91% lower input costs and 77% lower output costs compared to GPT-5.4, with performance within 3-6 months of state-of-the-art
V4-Flash requires only 10% of single-token FLOPs and 7% of KV cache size compared to V3.2 in 1M token contexts
Developers report spending $0.09-$0.95 on tasks previously costing $9-$13 with competing models
V4-Pro represents the largest open-weights model at 1.6 trillion total parameters, exceeding previous leaders Kimi K2.6 and GLM-5.1

Two Model Variants Target Different Use Cases

Dramatic Efficiency Improvements Over Previous Generation

Performance Approaches Frontier Models at Fraction of Cost

Developer Adoption and Concerns

Key Takeaways

DeepSeek V4-Pro and V4-Flash launched April 24, 2026 with MIT licenses, 1 million token context windows, and pricing starting at $0.14 per million input tokens

V4-Pro achieves 91% lower input costs and 77% lower output costs compared to GPT-5.4, with performance within 3-6 months of state-of-the-art

V4-Flash requires only 10% of single-token FLOPs and 7% of KV cache size compared to V3.2 in 1M token contexts

Developers report spending $0.09-$0.95 on tasks previously costing $9-$13 with competing models

V4-Pro represents the largest open-weights model at 1.6 trillion total parameters, exceeding previous leaders Kimi K2.6 and GLM-5.1