
rdi.berkeley.edu
April 11, 2026
18 min read
56/100
Summary
Automated scanning reveals that top AI models frequently achieve high benchmark scores that do not accurately reflect their capabilities. The reliance on these benchmarks has led to a misrepresentation of model performance in the AI industry.
Key Takeaways
Community Sentiment
Positives
Concerns

Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changed
Feb 12, 2026

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
Feb 22, 2026

Vulnerability research is cooked
Mar 30, 2026

1Password open sources a benchmark to stop AI agents from leaking credentials
Feb 13, 2026

Research-Driven Agents: When an agent reads before it codes
Apr 9, 2026