Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.
github.com
5 min
4/21/2026
BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.
github.com
6 min
2/18/2026
Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.
github.com
5 min
4/21/2026
BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.
github.com
6 min
2/18/2026
Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.
github.com
5 min
4/21/2026
BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.
github.com
6 min
2/18/2026
No more articles to load