Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#ai-ethics#claude#code-generation#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
neural-networksgpu-computingdeveloper-toolsai-hardware

A CPU that runs entirely on GPU

GitHub - robertcprice/nCPU: nCPU: model-native and tensor-optimized CPU research runtimes with organized workloads, tools, and docs

github.com

March 4, 2026

8 min read

Summary

nCPU is a CPU architecture that operates entirely on GPU, utilizing tensors for registers, memory, flags, and the program counter. All arithmetic operations, including addition, multiplication, bitwise operations, and shifts, are performed through trained neural networks, with specific methods like Kogge-Stone carry-lookahead for addition and learned byte-pair lookup tables for multiplication.

Key Takeaways

  • The nCPU operates entirely on GPU, utilizing tensors for registers, memory, flags, and the program counter, with all arithmetic performed through trained neural networks.
  • The nCPU achieves 100% accuracy on integer arithmetic, verified by 347 automated tests, and includes 23 models totaling approximately 135 MB.
  • Multiplication in the nCPU is 12 times faster than addition, contrasting with conventional CPUs where multiplication is typically slower than addition.
  • The nCPU benchmarks show execution times ranging from 136 to 262 microseconds per cycle, depending on the instruction mix, with models loading in 60 milliseconds.

Community Sentiment

Mixed

Positives

  • The exploration of running a CPU entirely on a GPU opens up intriguing possibilities for future computing architectures, potentially revolutionizing how we approach processing tasks.

Concerns

  • The skepticism about completely replacing CPUs with GPUs highlights fundamental differences in how each handles latency and processing, suggesting that a full transition may not be feasible.
Read original article

Related Articles

GitHub - danveloper/flash-moe: Running a big model on a small laptop

Flash-MoE: Running a 397B Parameter Model on a Laptop

Mar 22, 2026

GitHub - Frikallo/parakeet.cpp: Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory and Cuda support

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Feb 27, 2026

GitHub - Zaneham/BarraCUDA: Open-source CUDA compiler targeting AMD GPUs (and more in the future!). Compiles .cu to GFX11 machine code.

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

Feb 17, 2026

GitHub - t8/hypura: Run models too big for your Mac's memory

Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe

Mar 24, 2026

Source

github.com

Published

March 4, 2026

Reading Time

8 minutes

Relevance Score

61/100

🔥🔥🔥🔥🔥

Why It Matters

This page is optimized for focused reading: quick context up top, a clean summary block, and a direct path to the original source when you want the full story.