Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#ai-ethics#code-generation#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
llmscode-generationai-performancedeveloper-tools

Are LLMs not getting better?

Entropic Thoughts

entropicthoughts.com

March 12, 2026

3 min read

🔥🔥🔥🔥🔥

56/100

Summary

LLMs demonstrate a significant drop in performance when the success criterion shifts from "passes all tests" to "would get approved by the maintainer." The time to reach a 50% success rate decreases from 50 minutes to 8 minutes under the more stringent criterion.

Key Takeaways

  • LLMs have not shown improvement in programming abilities for over a year, as indicated by constant merge rates in performance data.
  • The Brier score analysis suggests that models predicting constant merge rates are more accurate than those suggesting a linear growth trend.
  • There is no clear evidence of improved capabilities in newer models from Anthropic and Google, as merge rates have not been measured as rigorously since the metr study.
  • Claims of increased LLM capabilities in recent months lack substantiation, echoing similar claims made throughout 2025 that were later proven false.
Read original article

Community Sentiment

Mixed

Positives

  • There is a clear improvement in AI models from Sonnet 3.7 to Opus 4.0 to Sonnet 4.5, indicating progress in AI capabilities despite some skepticism.
  • The introduction of tools like Claude Code represents a significant advancement in AI applications for coding, driving mainstream adoption.

Concerns

  • Many users report that AI tools still require significant oversight and often produce inaccurate results, indicating limited improvement in reliability.
  • Perceived improvements in newer models may be largely placebo, as users struggle to differentiate between versions in practical tasks.
  • Overall, there is a consensus that coding abilities of AI models have not noticeably improved over the past year, raising concerns about stagnation in development.