DeepSWE is a long-horizon software engineering benchmark designed to evaluate coding agents on original engineering tasks. It features contamination-free tasks, high diversity across 91 repositories in five programming languages, and real-world applicability.
deepswe.datacurve.ai
20 min
5/26/2026
DeepSWE is a long-horizon software engineering benchmark designed to evaluate coding agents on original engineering tasks. It features contamination-free tasks, high diversity across 91 repositories in five programming languages, and real-world applicability.
deepswe.datacurve.ai
20 min
5/26/2026
DeepSWE is a long-horizon software engineering benchmark designed to evaluate coding agents on original engineering tasks. It features contamination-free tasks, high diversity across 91 repositories in five programming languages, and real-world applicability.
deepswe.datacurve.ai
20 min
5/26/2026
No more articles to load