
openai.com
April 26, 2026
9 min read
56/100
Summary
SWE-bench Verified is becoming less reliable for measuring frontier coding capabilities due to contamination. SWE-bench Pro is recommended as a more accurate alternative for assessing models on autonomous software engineering tasks.
Key Takeaways
Community Sentiment
Positives
Concerns