Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#ai-ethics#claude#code-generation#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

© 2026 Themata.AI • All Rights Reserved

Privacy

|

Cookies

|

Contact
llmsvictorian-literatureethical-ailanguage-models

Mr. Chatterbox is a Victorian-era ethically trained model

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

simonwillison.net

March 31, 2026

4 min read

🔥🔥🔥🔥🔥

48/100

Summary

Mr. Chatterbox is a language model trained on over 28,000 Victorian-era British texts published between 1837 and 1899. The model can be run locally on personal computers and is based on a dataset provided by the British Library.

Key Takeaways

  • Mr. Chatterbox is a language model trained on over 28,000 Victorian-era British texts published between 1837 and 1899, using a dataset from the British Library.
  • The model has approximately 340 million parameters and is trained entirely on historical data, with no inputs from after 1899.
  • Mr. Chatterbox is relatively small at 2.05GB and is described as providing responses that resemble a Markov chain rather than a sophisticated conversational partner.
  • The training corpus consists of about 2.93 billion tokens, which is significantly less than the amount suggested for more effective language models.
Read original article

Community Sentiment

Mixed

Positives

  • The model's ethical training approach ensures compliance with copyright laws, which is crucial for responsible AI development and deployment.
  • Using this model could help demystify LLMs for the general public, providing a clearer understanding of how text prediction works.

Concerns

  • The model's 340 million parameters seem insufficient for generating coherent Victorian speech, raising concerns about its effectiveness in capturing the desired style.
  • Limiting training data to works published before 1899 may overlook significant copyright issues, potentially impacting the model's applicability and reliability.

Related Articles

Bcachefs creator claims his custom LLM is 'fully conscious'

Bcachefs creator insists his custom LLM is female and 'fully conscious'

Feb 25, 2026

Profiling Hacker News users based on their comments

Profiling Hacker News users based on their comments

Mar 22, 2026

What is happening to writing?

What is happening to writing? Cognitive debt, Claude Code, the space around AI

Feb 18, 2026