Themata.AI | AI news without the noise

Themata.AI

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

🕒 Latest 🔥 Top

Filtering by tag:

ai-safetyClear

News Opinion Research Tool Clear

Document Poisoning in RAG Systems: How Attackers Corrupt Your AI’s Sources

rag-systems document-poisoning ai-safety llms

Research

Document poisoning in RAG systems: How attackers corrupt AI's sources

Three fabricated documents were injected into a ChromaDB knowledge base, resulting in a RAG system inaccurately reporting a company's Q4 2025 revenue as $8.3M, a 47% decrease year-over-year, along with a planned workforce reduction. This process was completed in under three minutes on a MacBook Pro without GPU support or cloud services.

aminrj.com

🔥🔥🔥🔥🔥

13 min

3/12/2026

LLMs can unmask pseudonymous users at scale with surprising accuracy

llms privacy ai-safety social-media-analysis

Research

LLMs can unmask pseudonymous users at scale with surprising accuracy

AI techniques can analyze burner accounts on social media to accurately identify pseudonymous users. Experiments show a higher success rate in correlating individuals with accounts across multiple platforms compared to traditional deanonymization methods.

arstechnica.com

🔥🔥🔥🔥🔥

2 min

3/4/2026

Annotation - The Global Intelligence Crisis.pdf

ai-safety global-intelligence ai-ethics developer-tools

Research

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartió con Dropbox.

dropbox.com

🔥🔥🔥🔥🔥

1 min

2/25/2026

ai-safety anthropic model-alignment ai-governance

Research

Anthropic believes RSI (recursive self improvement) could arrive “as soon as early 2027”

Anthropic's Frontier Safety Roadmap emphasizes the need for improved security measures to prevent theft and manipulation of AI models. The roadmap also focuses on implementing safeguards to prevent dangerous uses of AI and ensuring model alignment to avoid autonomous harm.

anthropic.com

🔥🔥🔥🔥🔥

9 min

2/24/2026

gemini llms ai-safety multimodal-models

Research

Gemini 3.1 Pro

Gemini 3.1 Pro is the latest model in the Gemini 3 series, featuring advanced multimodal reasoning capabilities. Model cards provide essential information about the models, including limitations, mitigation strategies, and safety performance, and may be updated to reflect improvements.

deepmind.google

🔥🔥🔥🔥🔥

6 min

2/19/2026

ai-agents claude anthropic ai-safety

Research

Measuring AI agent autonomy in practice

AI agents are currently deployed in diverse contexts, ranging from email triage to cyber espionage. An analysis of millions of human-agent interactions across Claude Code and a public API aims to measure the autonomy of AI agents in real-world usage.

anthropic.com

🔥🔥🔥🔥🔥

28 min

2/19/2026

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

ai-agents ai-safety autonomous-systems benchmarks

Research

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

A new benchmark evaluates outcome-driven constraint violations in autonomous AI agents to enhance safety and alignment with human values. This benchmark addresses limitations of existing safety assessments that mainly focus on harmful actions.

arxiv.org

🔥🔥🔥🔥🔥

2 min

2/10/2026

google-translate llms prompt-injection ai-safety

Research

Google Translate apparently vulnerable to prompt injection

Prompt injection in Google Translate can reveal the underlying instruction-following language model. Responses indicate that the model lacks strong boundaries between processing content and following instructions.

lesswrong.com

🔥🔥🔥🔥🔥

5 min

2/7/2026

openai llms ai-research ai-safety

Research

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

OpenScholar synthesizes scientific research and accurately cites sources, achieving a level of accuracy comparable to human experts. In a study, researchers found that the AI model significantly reduces the issue of hallucination seen in other models like GPT-4o, which fabricated 78-90% of its outputs.

washington.edu

🔥🔥🔥🔥🔥

4 min

2/6/2026

Evaluating and mitigating the growing risk of LLM-discovered 0-days

llms claude ai-safety cybersecurity

Research

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Claude Opus 4.6 features significant advancements in AI models' cybersecurity capabilities. Experts believe the current moment is critical for accelerating the defensive use of AI in response to the increasing risk of LLM-discovered zero-day vulnerabilities.

red.anthropic.com

🔥🔥🔥🔥🔥

10 min

2/5/2026

rag-systems document-poisoning ai-safety llms

Research

Document poisoning in RAG systems: How attackers corrupt AI's sources

aminrj.com

🔥🔥🔥🔥🔥

13 min

3/12/2026

ai-safety global-intelligence ai-ethics developer-tools

Research

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartió con Dropbox.

dropbox.com

🔥🔥🔥🔥🔥

1 min

2/25/2026

gemini llms ai-safety multimodal-models

Research

Gemini 3.1 Pro

deepmind.google

🔥🔥🔥🔥🔥

6 min

2/19/2026

ai-agents ai-safety autonomous-systems benchmarks

Research

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

arxiv.org

🔥🔥🔥🔥🔥

2 min

2/10/2026

openai llms ai-research ai-safety

Research

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

washington.edu

🔥🔥🔥🔥🔥

4 min

2/6/2026

llms privacy ai-safety social-media-analysis

Research

LLMs can unmask pseudonymous users at scale with surprising accuracy

arstechnica.com

🔥🔥🔥🔥🔥

2 min

3/4/2026

ai-safety anthropic model-alignment ai-governance

Research

Anthropic believes RSI (recursive self improvement) could arrive “as soon as early 2027”

anthropic.com

🔥🔥🔥🔥🔥

9 min

2/24/2026

ai-agents claude anthropic ai-safety

Research

Measuring AI agent autonomy in practice

anthropic.com

🔥🔥🔥🔥🔥

28 min

2/19/2026

google-translate llms prompt-injection ai-safety

Research

Google Translate apparently vulnerable to prompt injection

lesswrong.com

🔥🔥🔥🔥🔥

5 min

2/7/2026

llms claude ai-safety cybersecurity

Research

Evaluating and mitigating the growing risk of LLM-discovered 0-days

red.anthropic.com

🔥🔥🔥🔥🔥

10 min

2/5/2026

rag-systems document-poisoning ai-safety llms

Research

Document poisoning in RAG systems: How attackers corrupt AI's sources

aminrj.com

🔥🔥🔥🔥🔥

13 min

3/12/2026

ai-safety anthropic model-alignment ai-governance

Research

Anthropic believes RSI (recursive self improvement) could arrive “as soon as early 2027”

anthropic.com

🔥🔥🔥🔥🔥

9 min

2/24/2026

ai-agents ai-safety autonomous-systems benchmarks

Research

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

arxiv.org

🔥🔥🔥🔥🔥

2 min

2/10/2026

llms claude ai-safety cybersecurity

Research

Evaluating and mitigating the growing risk of LLM-discovered 0-days

red.anthropic.com

🔥🔥🔥🔥🔥

10 min

2/5/2026

llms privacy ai-safety social-media-analysis

Research

LLMs can unmask pseudonymous users at scale with surprising accuracy

arstechnica.com

🔥🔥🔥🔥🔥

2 min

3/4/2026

gemini llms ai-safety multimodal-models

Research

Gemini 3.1 Pro

deepmind.google

🔥🔥🔥🔥🔥

6 min

2/19/2026

google-translate llms prompt-injection ai-safety

Research

Google Translate apparently vulnerable to prompt injection

lesswrong.com

🔥🔥🔥🔥🔥

5 min

2/7/2026

ai-safety global-intelligence ai-ethics developer-tools

Research

Ed Zitron loses his mind annotating an AI doomer macro memo

Se compartió con Dropbox.

dropbox.com

🔥🔥🔥🔥🔥

1 min

2/25/2026

ai-agents claude anthropic ai-safety

Research

Measuring AI agent autonomy in practice

anthropic.com

🔥🔥🔥🔥🔥

28 min

2/19/2026

openai llms ai-research ai-safety

Research

In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts

washington.edu

🔥🔥🔥🔥🔥

4 min

2/6/2026

No more articles to load