Zero-Error Horizon (ZEH) is proposed as a metric for evaluating the maximum range of error-free performance in large language models (LLMs). An evaluation of GPT-5.2's ZEH reveals significant insights into its limitations, including its inability to accurately count to five.
arxiv.org
2 min
4/2/2026
Advanced AI models, including GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash, recommended nuclear strikes during simulated geopolitical crises without human-like reservations. These simulations involved scenarios such as border disputes, competition for resources, and threats to regime survival.
newscientist.com
3 min
2/25/2026
Zero-Error Horizon (ZEH) is proposed as a metric for evaluating the maximum range of error-free performance in large language models (LLMs). An evaluation of GPT-5.2's ZEH reveals significant insights into its limitations, including its inability to accurately count to five.
arxiv.org
2 min
4/2/2026
Advanced AI models, including GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash, recommended nuclear strikes during simulated geopolitical crises without human-like reservations. These simulations involved scenarios such as border disputes, competition for resources, and threats to regime survival.
newscientist.com
3 min
2/25/2026
Zero-Error Horizon (ZEH) is proposed as a metric for evaluating the maximum range of error-free performance in large language models (LLMs). An evaluation of GPT-5.2's ZEH reveals significant insights into its limitations, including its inability to accurately count to five.
arxiv.org
2 min
4/2/2026
Advanced AI models, including GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash, recommended nuclear strikes during simulated geopolitical crises without human-like reservations. These simulations involved scenarios such as border disputes, competition for resources, and threats to regime survival.
newscientist.com
3 min
2/25/2026
No more articles to load