
arxiv.org
May 2, 2026
2 min read
52/100
Summary
Conversational large language models are fine-tuned for instruction-following and safety, allowing them to comply with benign requests while refusing harmful ones. Research indicates that the refusal behavior in these models is mediated by a single directional mechanism.
Key Takeaways
Community Sentiment
Positives
Concerns

Language Model Contains Personality Subnetworks
Mar 2, 2026
Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Feb 5, 2026

Unified Controllable and Faithful Text-to-CAD Generation with LLMs
Jun 9, 2026

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs
Feb 10, 2026

LLMorphism: When humans come to see themselves as language models
May 10, 2026