AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

Back to all news

llms copyright-issues openai ai-ethics

AIs can generate near-verbatim copies of novels from training data

arstechnica.com

February 23, 2026

1 min read

🔥🔥🔥🔥🔥

48/100

Summary

Top AI models can generate near-verbatim copies of bestselling novels, indicating that they memorize more training data than previously understood. This memorization capability raises legal concerns regarding copyright and the implications for AI developers.

Key Takeaways

Top AI models can generate near-verbatim copies of bestselling novels, challenging claims that they do not store copyrighted works.
Recent studies indicate that large language models memorize more of their training data than previously understood.
AI and legal experts warn that this memorization could impact ongoing copyright lawsuits against AI companies.
A study found that models like Gemini 2.5 and Grok 3 can reproduce significant portions of texts from well-known books with high accuracy.

Read original article

Community Sentiment

Negative

Positives

The ability of AI models to generate near-verbatim text highlights their advanced capabilities in language understanding and generation, raising questions about the nature of creativity and authorship.

Concerns

The need to jailbreak certain models to extract text suggests potential vulnerabilities in AI systems that could lead to copyright infringement, raising significant legal and ethical concerns.
Describing AI as 'plagiarism software' underscores the ongoing debate about the ethical implications of using copyrighted material in training datasets.