A prediction indicates that a new Frontier Open Source LLM will be released on December 3, 2026. The analysis compares the performance gap between open weights and closed source LLMs by examining historical benchmarks.
blog.doubleword.ai
2 min
1d ago
GLM-5.2 has become the leading open weights model on the Artificial Analysis Intelligence Index, scoring 51. It matches the size of GLM-5.1 with 744 billion total parameters and 40 billion active parameters, but surpasses it by 11 points in the Intelligence Index v4.1.
artificialanalysis.ai
3 min
6/17/2026
FrontierCode is a new benchmark designed to evaluate the quality of AI-generated code in production environments. It aims to raise standards beyond mere correctness to assess models' ability to produce high-quality code.
cognition.ai
13 min
6/8/2026
CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in hallucination. bridgebench.ai just confirmed that Claude Opus 4.6 has reduced reasoning levels and is nerfed. Bà i đăng Cuộc trò ...
twitter.com
1 min
4/12/2026
A prediction indicates that a new Frontier Open Source LLM will be released on December 3, 2026. The analysis compares the performance gap between open weights and closed source LLMs by examining historical benchmarks.
blog.doubleword.ai
2 min
1d ago
FrontierCode is a new benchmark designed to evaluate the quality of AI-generated code in production environments. It aims to raise standards beyond mere correctness to assess models' ability to produce high-quality code.
cognition.ai
13 min
6/8/2026
GLM-5.2 has become the leading open weights model on the Artificial Analysis Intelligence Index, scoring 51. It matches the size of GLM-5.1 with 744 billion total parameters and 40 billion active parameters, but surpasses it by 11 points in the Intelligence Index v4.1.
artificialanalysis.ai
3 min
6/17/2026
CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in hallucination. bridgebench.ai just confirmed that Claude Opus 4.6 has reduced reasoning levels and is nerfed. Bà i đăng Cuộc trò ...
twitter.com
1 min
4/12/2026
A prediction indicates that a new Frontier Open Source LLM will be released on December 3, 2026. The analysis compares the performance gap between open weights and closed source LLMs by examining historical benchmarks.
blog.doubleword.ai
2 min
1d ago
CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in hallucination. bridgebench.ai just confirmed that Claude Opus 4.6 has reduced reasoning levels and is nerfed. Bà i đăng Cuộc trò ...
twitter.com
1 min
4/12/2026
GLM-5.2 has become the leading open weights model on the Artificial Analysis Intelligence Index, scoring 51. It matches the size of GLM-5.1 with 744 billion total parameters and 40 billion active parameters, but surpasses it by 11 points in the Intelligence Index v4.1.
artificialanalysis.ai
3 min
6/17/2026
FrontierCode is a new benchmark designed to evaluate the quality of AI-generated code in production environments. It aims to raise standards beyond mere correctness to assess models' ability to produce high-quality code.
cognition.ai
13 min
6/8/2026
No more articles to load