
metr.org
March 11, 2026
18 min read
61/100
Summary
Approximately 50% of test-passing SWE-bench Verified pull requests created by AI agents between mid-2024 and late-2025 would not be merged into the main branch by repository maintainers. The findings suggest that the lack of iterative feedback for AI agents does not indicate a fundamental capability limitation.
Key Takeaways
Community Sentiment
Positives
Concerns

Why SWE-bench Verified no longer measures frontier coding capabilities
Apr 26, 2026

DeepSWE: A contamination-free benchmark for long-horizon coding agents
May 26, 2026

When AI Builds Itself: Our progress toward recursive self-improvement
Jun 4, 2026

FrontierCode
Jun 8, 2026
Measuring AI agent autonomy in practice
Feb 19, 2026