Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed

Themata.AI

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

Back to all news

foundation-models sparse-architectures ai-agents ai-reasoning

Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed

static.stepfun.com

February 19, 2026

16 min read

Summary

Step 3.5 Flash is an open-source foundation model designed for advanced reasoning and agentic capabilities. Utilizing a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B parameters per token, enabling high intelligence density and real-time interaction.

Key Takeaways

Step 3.5 Flash is an open-source foundation model with a sparse Mixture of Experts architecture, activating only 11B of its 196B parameters per token for enhanced efficiency and reasoning capabilities.
The model achieves a generation throughput of 100-300 tokens per second, enabling complex, multi-step reasoning with immediate responsiveness.
Step 3.5 Flash supports a 256K context window using a 3:1 Sliding Window Attention ratio, significantly reducing computational overhead while maintaining performance on large datasets.
It is optimized for local deployment on high-end consumer hardware, ensuring data privacy and performance in real-world applications.

Community Sentiment

Mixed

Positives

Step 3.5 Flash demonstrates impressive context efficiency, allowing full 256k context streams on a 128GB machine, which enhances its usability for complex tasks.
The model achieves good inference speeds on Macs, with notable performance metrics like 36 t/s tg and 300 t/s pp, making it practical for real-time applications.
Its 51% score on Terminal-Bench 2.0 indicates a solid capability for handling sophisticated, long-horizon tasks, which is crucial for advanced AI applications.
The Mixture of Experts architecture allows selective activation of parameters, optimizing performance while maintaining efficiency, which is a significant advancement in model design.

Concerns

The model has a tendency to hallucinate significantly, which raises concerns about its reliability for critical applications and necessitates cautious use.
Some users question the relevance of the 51% score on Terminal-Bench 2.0, suggesting it may not adequately reflect the model's stability in handling complex tasks.
While the number of parameters is often highlighted, the lack of support for local inference in top models limits their practical applications for many users.

Read original article

Source

static.stepfun.com

Published

February 19, 2026

Reading Time

16 minutes

Relevance Score

59/100

🔥🔥🔥🔥🔥

Why It Matters

This page is optimized for focused reading: quick context up top, a clean summary block, and a direct path to the original source when you want the full story.