
blog.google
June 5, 2026
4 min read
54/100
Summary
Gemma 4 has introduced Multi-Token Prediction (MTP) to enhance inference speed. New checkpoints optimized with Quantization-Aware Training (QAT) have been released to improve efficiency for mobile and laptop use.
Key Takeaways

Gemma 4 12B: A unified, encoder-free multimodal model
Jun 3, 2026

Accelerating Gemma 4: faster inference with multi-token prediction drafters
May 5, 2026

Unsloth Dynamic 2.0 GGUFs
Feb 28, 2026

Google releases Gemma 4 open models
Apr 2, 2026

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference
Apr 15, 2026