Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#ai-ethics#code-generation#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
🕒 Latest🔥 Top
WeekMonthYearAll Time

Filtering by tag:

cudaClear
GitHub - Luce-Org/lucebox-hub: Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.
llmsdeveloper-toolsoptimizationcuda
Tool

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.

github.com

🔥🔥🔥🔥🔥

5 min

4/21/2026

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.

github.com

🔥🔥🔥🔥🔥

6 min

2/18/2026

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.

github.com

🔥🔥🔥🔥🔥

5 min

4/21/2026

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.

github.com

🔥🔥🔥🔥🔥

6 min

2/18/2026

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Lucebox is an optimization hub for hand-tuned LLM inference, specifically designed for individual consumer hardware. It features kernels, speculative decoding, and quantization tailored for each target, with the first megakernel for hybrid DeltaNet/Attention LLMs achieving 1.87 tokens per joule on a 2020 GPU.

github.com

🔥🔥🔥🔥🔥

5 min

4/21/2026

BarraCUDA Open-source CUDA compiler targeting AMD GPUs

BarraCUDA is an open-source CUDA compiler designed for AMD GPUs, capable of compiling .cu files directly to GFX11 machine code and generating ELF .hsaco binaries. The compiler, written in 15,000 lines of C99, has no LLVM dependency and aims to support additional architectures in the future.

github.com

🔥🔥🔥🔥🔥

6 min

2/18/2026

No more articles to load