
github.com
April 20, 2026
5 min read
54/100
Summary
Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.
Key Takeaways
Community Sentiment
Positives
Concerns

How to run Qwen 3.5 locally
Mar 7, 2026

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?
Mar 24, 2026

TurboQuant KV Compression and SSD Expert Streaming for M5 Pro and IOS
Apr 1, 2026
Flash-MoE: Running a 397B Parameter Model on a Laptop
Mar 22, 2026

Unsloth Dynamic 2.0 GGUFs
Feb 28, 2026