Kog AI has launched a tech preview of the Kog Inference Engine (KIE), achieving 3,000 output tokens per second on 8Ć AMD MI300X GPUs and 2,100 on 8Ć NVIDIA H200 GPUs using FP16 without speculative decoding. The preview currently supports a 2B model, with plans to add support for large third-party MoE models at similar speeds.
blog.kog.ai
18 min
5/29/2026
Zero-latency API auth and billing for distributed GPU inference.
ionrouter.io
1 min
3/12/2026
Kog AI has launched a tech preview of the Kog Inference Engine (KIE), achieving 3,000 output tokens per second on 8Ć AMD MI300X GPUs and 2,100 on 8Ć NVIDIA H200 GPUs using FP16 without speculative decoding. The preview currently supports a 2B model, with plans to add support for large third-party MoE models at similar speeds.
blog.kog.ai
18 min
5/29/2026
Zero-latency API auth and billing for distributed GPU inference.
ionrouter.io
1 min
3/12/2026
Kog AI has launched a tech preview of the Kog Inference Engine (KIE), achieving 3,000 output tokens per second on 8Ć AMD MI300X GPUs and 2,100 on 8Ć NVIDIA H200 GPUs using FP16 without speculative decoding. The preview currently supports a 2B model, with plans to add support for large third-party MoE models at similar speeds.
blog.kog.ai
18 min
5/29/2026
Zero-latency API auth and billing for distributed GPU inference.
ionrouter.io
1 min
3/12/2026
No more articles to load