
blog.kog.ai
May 29, 2026
18 min read
58/100
Summary
Kog AI has launched a tech preview of the Kog Inference Engine (KIE), achieving 3,000 output tokens per second on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 GPUs using FP16 without speculative decoding. The preview currently supports a 2B model, with plans to add support for large third-party MoE models at similar speeds.
Key Takeaways
Community Sentiment
Positives
Concerns