SWE-bench Verified is becoming less reliable for measuring frontier coding capabilities due to contamination. SWE-bench Pro is recommended as a more accurate alternative for assessing models on autonomous software engineering tasks.
openai.com
9 min
4/26/2026
SWE-bench Verified is becoming less reliable for measuring frontier coding capabilities due to contamination. SWE-bench Pro is recommended as a more accurate alternative for assessing models on autonomous software engineering tasks.
openai.com
9 min
4/26/2026
SWE-bench Verified is becoming less reliable for measuring frontier coding capabilities due to contamination. SWE-bench Pro is recommended as a more accurate alternative for assessing models on autonomous software engineering tasks.
openai.com
9 min
4/26/2026
No more articles to load