AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

imil.net

June 13, 2026

5 min read

🔥🔥🔥🔥🔥

62/100

Summary

An RTX 5080 and RTX 3090 setup achieves over 80 tokens per second on the Qwen 3.6 27B Q8 model. The RTX 3090, with 24GB of memory, significantly enhances performance, allowing for initial speeds of 30 tokens per second, increasing to 50-60 tokens per second with MTP.

Key Takeaways

The setup using an RTX 5080 and RTX 3090 achieves over 80 tokens per second on the Qwen 3.6 model.
The Asus Prime X570-Pro motherboard is required to utilize both GPUs effectively by splitting the PCIe lanes.
BIOS settings must be adjusted, including disabling CSM and enabling Above 4G Decoding and ReSize BAR Support, to ensure both GPUs function properly.
The NVIDIA driver installation process is complex, especially when using different GPU models, and requires specific configurations to load correctly.

Read original article

Community Sentiment

Mixed

Positives

The RTX 5080 and 3090 combination achieving 80 tokens per second demonstrates impressive performance, showcasing the potential for high-throughput local inference.
Users are finding more applications for Qwen 3.6 27B, indicating its versatility in various use cases, especially when answers are already present in the context.
Local models like Qwen 3.6 can provide a straightforward failure mode, making it easier for users to identify issues compared to more complex models like Claude.

Concerns

The high cost of hardware and electricity makes local inference setups non-competitive with cloud options, limiting accessibility for many users.
Concerns about the instability of software repositories, such as llamacpp, hinder the effective utilization of powerful hardware setups.
Users express frustration over the lack of clarity regarding model versions and performance consistency when using hosted models like Claude.