A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.
arxiv.org
2 min
2/10/2026
A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.
arxiv.org
2 min
2/10/2026
A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.
arxiv.org
2 min
2/10/2026
No more articles to load