Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#ai-ethics#code-generation#ai-safety#openai#anthropic#discussion

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
llmsqwengpu-computingai-experiments

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

imil.net

June 13, 2026

5 min read

🔥🔥🔥🔥🔥

51/100

Summary

An RTX 5080 and RTX 3090 setup achieves over 80 tokens per second on the Qwen 3.6 27B Q8 model. The RTX 3090, with 24GB of memory, significantly enhances performance, allowing for initial speeds of 30 tokens per second, increasing to 50-60 tokens per second with MTP.

Key Takeaways

  • The setup using an RTX 5080 and RTX 3090 achieves over 80 tokens per second on the Qwen 3.6 model.
  • The Asus Prime X570-Pro motherboard is required to utilize both GPUs effectively by splitting the PCIe lanes.
  • BIOS settings must be adjusted, including disabling CSM and enabling Above 4G Decoding and ReSize BAR Support, to ensure both GPUs function properly.
  • The NVIDIA driver installation process is complex, especially when using different GPU models, and requires specific configurations to load correctly.
Read original article

Community Sentiment

Mixed

Positives

  • The RTX 5080 and 3090 combination achieving 80 tokens per second demonstrates impressive performance, showcasing the potential for high-throughput local inference.
  • Users are finding more applications for Qwen 3.6 27B, indicating its versatility in various use cases, especially when answers are already present in the context.
  • Local models like Qwen 3.6 can provide a straightforward failure mode, making it easier for users to identify issues compared to more complex models like Claude.

Concerns

  • The high cost of hardware and electricity makes local inference setups non-competitive with cloud options, limiting accessibility for many users.
  • Concerns about the instability of software repositories, such as llamacpp, hinder the effective utilization of powerful hardware setups.
  • Users express frustration over the lack of clarity regarding model versions and performance consistency when using hosted models like Claude.

Related Articles

I Put a Datacenter GPU in My Gaming PC for £200

I put a datacenter GPU in my gaming PC

May 31, 2026

GitHub - Luce-Org/lucebox-hub: Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Apr 20, 2026

My first impressions on ROCm and Strix Halo

My first impressions on ROCm and Strix Halo

Apr 18, 2026

How to Setup a Local Coding Agent on macOS

How to setup a local coding agent on macOS

Jun 12, 2026

Qwen3.5 - How to Run Locally Guide | Unsloth Documentation

How to run Qwen 3.5 locally

Mar 7, 2026