
github.com
March 4, 2026
8 min read
61/100
Summary
nCPU is a CPU architecture that operates entirely on GPU, utilizing tensors for registers, memory, flags, and the program counter. All arithmetic operations, including addition, multiplication, bitwise operations, and shifts, are performed through trained neural networks, with specific methods like Kogge-Stone carry-lookahead for addition and learned byte-pair lookup tables for multiplication.
Key Takeaways
Community Sentiment
Positives
Concerns
Flash-MoE: Running a 397B Parameter Model on a Laptop
Mar 22, 2026

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration
Feb 27, 2026
BarraCUDA Open-source CUDA compiler targeting AMD GPUs
Feb 17, 2026

We got 207 tok/s with Qwen3.5-27B on an RTX 3090
Apr 20, 2026

Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe
Mar 24, 2026