AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

Back to all news

deepseek llms ai-models mixture-of-experts

DeepSeek V4–almost on the frontier, a fraction of the price

$DeepSeek V4—almost on the frontier, a fraction of the price$

simonwillison.net

May 1, 2026

3 min read

🔥🔥🔥🔥🔥

47/100

Summary

DeepSeek has released two preview models in its V4 series: DeepSeek-V4-Pro and DeepSeek-V4-Flash. The Pro model features 1.6 trillion total parameters with 49 billion active, while the Flash model has 284 billion total parameters and 13 billion active, both utilizing a 1 million token context Mixture of Experts architecture under the MIT license.

Key Takeaways

DeepSeek released two models in the V4 series: DeepSeek-V4-Pro with 1.6 trillion parameters and DeepSeek-V4-Flash with 284 billion parameters.
DeepSeek-V4-Pro is the largest open weights model available, surpassing Kimi K2.6 and GLM-5.1.
DeepSeek-V4-Flash is priced at $0.14 per million tokens input and $0.28 per million tokens output, making it the cheapest among small models.
DeepSeek-V4-Pro demonstrates competitive performance on reasoning benchmarks compared to frontier models but lags behind GPT-5.4 and Gemini-3.1-Pro by approximately 3 to 6 months.

Read original article

Community Sentiment

Positive

Positives

DeepSeek V4 Pro offers comparable quality to OpenAI's models at a significantly lower price, making it an attractive option for budget-conscious developers.
The API pricing of DeepSeek is highly competitive, allowing users to access a large number of tokens for a fraction of the cost compared to other providers.
Users appreciate the model's ability to perform well in practical applications, such as frontend development, indicating its versatility for various use cases.

Concerns

The reliance on traditional evaluation metrics like the pelican raises concerns about the model's innovation and ability to tackle novel challenges in AI.
Some users feel that many AI models, including DeepSeek, have converged on similar outputs, suggesting a lack of differentiation in capabilities.