github.com
March 22, 2026
6 min read
Summary
Flash-Moe is a pure C/Metal inference engine that runs the Qwen3.5-397B-A17B model, a 397 billion parameter Mixture-of-Experts model, on a MacBook Pro with 48GB RAM at over 4.4 tokens per second. The 209GB model streams from SSD using a custom Metal compute pipeline without relying on Python or other frameworks.
Key Takeaways
Community Sentiment
MixedPositives
Concerns
Source
github.com
Published
March 22, 2026
Reading Time
6 minutes
Relevance Score
65/100
Why It Matters
This page is optimized for focused reading: quick context up top, a clean summary block, and a direct path to the original source when you want the full story.