
lesswrong.com
February 7, 2026
5 min read
44/100
Summary
Prompt injection in Google Translate can reveal the underlying instruction-following language model. Responses indicate that the model lacks strong boundaries between processing content and following instructions.
Key Takeaways