Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#ai-ethics#code-generation#ai-safety#openai#anthropic#discussion

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
llmsamdinferenceai-performance

Performance per dollar is getting faster and cheaper

Performance per dollar is getting faster and cheaper | Wafer

wafer.ai

July 3, 2026

5 min read

🔥🔥🔥🔥🔥

59/100

Summary

Performance per dollar for AI inference is improving, with GLM5.2 served on AMD MI355X achieving 2626 tokens per second per node and 213 tokens per second in a single stream at over 2x lower cost than Blackwell. Demand for inference is increasing rapidly, outpacing supply, as new frontier models are released frequently.

Key Takeaways

  • AMD MI355X provides over 2x lower cost for inference compared to NVIDIA Blackwell while achieving a throughput of 2626 tok/s/node.
  • The demand for AI inference is increasing rapidly, outpacing supply and causing NVIDIA GPU prices to rise.
  • AMD's MI350 series competes with NVIDIA at the silicon level but lacks the same level of software support, which can delay performance optimization for new models.
  • Wafer achieved 213 tok/s on GLM5.2 using AMD MI355X, demonstrating better performance per dollar despite not topping the leaderboard.
Read original article

Community Sentiment

Mixed

Positives

  • There's a growing interest in AMD's performance per watt, which could shake up the data center landscape, especially for companies outside the US where energy costs are a major factor.
  • The mention of companies like Meta and OpenAI using AMD is a promising sign that they might finally be gaining traction against Nvidia's dominance.

Concerns

  • Quantization to FP4 is often a disaster for accuracy, with many models losing their edge and becoming 'lobotomized' — not what we need for frontier-level AI.
  • Skepticism runs high about AMD's ability to compete; many users are tired of being disappointed and doubt they'll finally see viable competition against Nvidia.

Related Articles

Bringing up DeepSeek-V4-Flash on AMD MI300X

Bringing Up DeepSeek-V4-Flash on AMD MI300X

Jun 2, 2026

A 10 year old Xeon is all you need - point.free

A 10 year old Xeon is all you need

Jun 1, 2026

GitHub - danveloper/flash-moe: Running a big model on a small laptop

Flash-MoE: Running a 397B Parameter Model on a Laptop

Mar 22, 2026

GitHub - jamesob/local-llm: Everything I know about running LLMs locally

Jamesob's guide to running SOTA LLMs locally

Jul 3, 2026

Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

May 29, 2026