Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#code-generation#ai-ethics#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Β© 2026 Themata.AI β€’ All Rights Reserved

Privacy

|

Cookies

|

Contact
πŸ•’ LatestπŸ”₯ Top

Filtering by tag:

ai-safetyClear
NewsOpinionResearchToolClear
Document Poisoning in RAG Systems: How Attackers Corrupt Your AI’s Sources
rag-systemsdocument-poisoningai-safetyllms
Research

Document poisoning in RAG systems: How attackers corrupt AI's sources

Three fabricated documents were injected into a ChromaDB knowledge base, resulting in a RAG system inaccurately reporting a company's Q4 2025 revenue as $8.3M, a 47% decrease year-over-year, along with a planned workforce reduction. This process was completed in under three minutes on a MacBook Pro without GPU support or cloud services.

aminrj.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

13 min

3/12/2026

LLMs can unmask pseudonymous users at scale with surprising accuracy

AI techniques can analyze burner accounts on social media to accurately identify pseudonymous users. Experiments show a higher success rate in correlating individuals with accounts across multiple platforms compared to traditional deanonymization methods.

arstechnica.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

3/4/2026

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartiΓ³ con Dropbox.

dropbox.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

1 min

2/25/2026

Anthropic believes RSI (recursive self improvement) could arrive β€œas soon as early 2027”

Anthropic's Frontier Safety Roadmap emphasizes the need for improved security measures to prevent theft and manipulation of AI models. The roadmap also focuses on implementing safeguards to prevent dangerous uses of AI and ensuring model alignment to avoid autonomous harm.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/24/2026

Gemini 3.1 Pro

Gemini 3.1 Pro is the latest model in the Gemini 3 series, featuring advanced multimodal reasoning capabilities. Model cards provide essential information about the models, including limitations, mitigation strategies, and safety performance, and may be updated to reflect improvements.

deepmind.google

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

6 min

2/19/2026

Measuring AI agent autonomy in practice

AI agents are currently deployed in diverse contexts, ranging from email triage to cyber espionage. An analysis of millions of human-agent interactions across Claude Code and a public API aims to measure the autonomy of AI agents in real-world usage.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

28 min

2/19/2026

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.

arxiv.org

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

2/10/2026

Google Translate apparently vulnerable to prompt injection

Prompt injection in Google Translate can reveal the underlying instruction-following language model. Responses indicate that the model lacks strong boundaries between processing content and following instructions.

lesswrong.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/7/2026

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

OpenScholar synthesizes scientific research and accurately cites sources, achieving a level of accuracy comparable to human experts. In a study, researchers found that the AI model significantly reduces the issue of hallucination seen in other models like GPT-4o, which fabricated 78-90% of its outputs.

washington.edu

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/6/2026

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Claude Opus 4.6 features significant advancements in AI models' cybersecurity capabilities. Experts believe the current moment is critical for accelerating the defensive use of AI in response to the increasing risk of LLM-discovered zero-day vulnerabilities.

red.anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

10 min

2/5/2026

Document poisoning in RAG systems: How attackers corrupt AI's sources

Three fabricated documents were injected into a ChromaDB knowledge base, resulting in a RAG system inaccurately reporting a company's Q4 2025 revenue as $8.3M, a 47% decrease year-over-year, along with a planned workforce reduction. This process was completed in under three minutes on a MacBook Pro without GPU support or cloud services.

aminrj.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

13 min

3/12/2026

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartiΓ³ con Dropbox.

dropbox.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

1 min

2/25/2026

Gemini 3.1 Pro

Gemini 3.1 Pro is the latest model in the Gemini 3 series, featuring advanced multimodal reasoning capabilities. Model cards provide essential information about the models, including limitations, mitigation strategies, and safety performance, and may be updated to reflect improvements.

deepmind.google

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

6 min

2/19/2026

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.

arxiv.org

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

2/10/2026

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

OpenScholar synthesizes scientific research and accurately cites sources, achieving a level of accuracy comparable to human experts. In a study, researchers found that the AI model significantly reduces the issue of hallucination seen in other models like GPT-4o, which fabricated 78-90% of its outputs.

washington.edu

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/6/2026

LLMs can unmask pseudonymous users at scale with surprising accuracy

AI techniques can analyze burner accounts on social media to accurately identify pseudonymous users. Experiments show a higher success rate in correlating individuals with accounts across multiple platforms compared to traditional deanonymization methods.

arstechnica.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

3/4/2026

Anthropic believes RSI (recursive self improvement) could arrive β€œas soon as early 2027”

Anthropic's Frontier Safety Roadmap emphasizes the need for improved security measures to prevent theft and manipulation of AI models. The roadmap also focuses on implementing safeguards to prevent dangerous uses of AI and ensuring model alignment to avoid autonomous harm.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/24/2026

Measuring AI agent autonomy in practice

AI agents are currently deployed in diverse contexts, ranging from email triage to cyber espionage. An analysis of millions of human-agent interactions across Claude Code and a public API aims to measure the autonomy of AI agents in real-world usage.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

28 min

2/19/2026

Google Translate apparently vulnerable to prompt injection

Prompt injection in Google Translate can reveal the underlying instruction-following language model. Responses indicate that the model lacks strong boundaries between processing content and following instructions.

lesswrong.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/7/2026

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Claude Opus 4.6 features significant advancements in AI models' cybersecurity capabilities. Experts believe the current moment is critical for accelerating the defensive use of AI in response to the increasing risk of LLM-discovered zero-day vulnerabilities.

red.anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

10 min

2/5/2026

Document poisoning in RAG systems: How attackers corrupt AI's sources

Three fabricated documents were injected into a ChromaDB knowledge base, resulting in a RAG system inaccurately reporting a company's Q4 2025 revenue as $8.3M, a 47% decrease year-over-year, along with a planned workforce reduction. This process was completed in under three minutes on a MacBook Pro without GPU support or cloud services.

aminrj.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

13 min

3/12/2026

Anthropic believes RSI (recursive self improvement) could arrive β€œas soon as early 2027”

Anthropic's Frontier Safety Roadmap emphasizes the need for improved security measures to prevent theft and manipulation of AI models. The roadmap also focuses on implementing safeguards to prevent dangerous uses of AI and ensuring model alignment to avoid autonomous harm.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/24/2026

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.

arxiv.org

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

2/10/2026

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Claude Opus 4.6 features significant advancements in AI models' cybersecurity capabilities. Experts believe the current moment is critical for accelerating the defensive use of AI in response to the increasing risk of LLM-discovered zero-day vulnerabilities.

red.anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

10 min

2/5/2026

LLMs can unmask pseudonymous users at scale with surprising accuracy

AI techniques can analyze burner accounts on social media to accurately identify pseudonymous users. Experiments show a higher success rate in correlating individuals with accounts across multiple platforms compared to traditional deanonymization methods.

arstechnica.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

2 min

3/4/2026

Gemini 3.1 Pro

Gemini 3.1 Pro is the latest model in the Gemini 3 series, featuring advanced multimodal reasoning capabilities. Model cards provide essential information about the models, including limitations, mitigation strategies, and safety performance, and may be updated to reflect improvements.

deepmind.google

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

6 min

2/19/2026

Google Translate apparently vulnerable to prompt injection

Prompt injection in Google Translate can reveal the underlying instruction-following language model. Responses indicate that the model lacks strong boundaries between processing content and following instructions.

lesswrong.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/7/2026

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartiΓ³ con Dropbox.

dropbox.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

1 min

2/25/2026

Measuring AI agent autonomy in practice

AI agents are currently deployed in diverse contexts, ranging from email triage to cyber espionage. An analysis of millions of human-agent interactions across Claude Code and a public API aims to measure the autonomy of AI agents in real-world usage.

anthropic.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

28 min

2/19/2026

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

OpenScholar synthesizes scientific research and accurately cites sources, achieving a level of accuracy comparable to human experts. In a study, researchers found that the AI model significantly reduces the issue of hallucination seen in other models like GPT-4o, which fabricated 78-90% of its outputs.

washington.edu

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/6/2026

No more articles to load