AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

Back to all news

llms code-generation ai-performance developer-tools

Are LLMs not getting better?

entropicthoughts.com

March 12, 2026

3 min read

🔥🔥🔥🔥🔥

56/100

Summary

LLMs demonstrate a significant drop in performance when the success criterion shifts from "passes all tests" to "would get approved by the maintainer." The time to reach a 50% success rate decreases from 50 minutes to 8 minutes under the more stringent criterion.

Key Takeaways

LLMs have not shown improvement in programming abilities for over a year, as indicated by constant merge rates in performance data.
The Brier score analysis suggests that models predicting constant merge rates are more accurate than those suggesting a linear growth trend.
There is no clear evidence of improved capabilities in newer models from Anthropic and Google, as merge rates have not been measured as rigorously since the metr study.
Claims of increased LLM capabilities in recent months lack substantiation, echoing similar claims made throughout 2025 that were later proven false.

Read original article

Community Sentiment

Mixed

Positives

There is a clear improvement in AI models from Sonnet 3.7 to Opus 4.0 to Sonnet 4.5, indicating progress in AI capabilities despite some skepticism.
The introduction of tools like Claude Code represents a significant advancement in AI applications for coding, driving mainstream adoption.

Concerns

Many users report that AI tools still require significant oversight and often produce inaccurate results, indicating limited improvement in reliability.
Perceived improvements in newer models may be largely placebo, as users struggle to differentiate between versions in practical tasks.
Overall, there is a consensus that coding abilities of AI models have not noticeably improved over the past year, raising concerns about stagnation in development.