People increasingly rely on generative artificial intelligence for reasoning, raising questions about the future of human judgment. Tri-System Theory is introduced to extend dual-process accounts of reasoning by adding a third system, System 3.
papers.ssrn.com
2 min
3/21/2026
The car wash test evaluates AI reasoning by asking whether to walk or drive 50 meters to a car wash. Most leading AI models, including Claude Sonnet 4.5, GPT-5.1, Llama, and Mistral, fail to provide the correct answer, which is to drive.
opper.ai
9 min
2/23/2026
Step 3.5 Flash is an open-source foundation model designed for advanced reasoning and agentic capabilities. Utilizing a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B parameters per token, enabling high intelligence density and real-time interaction.
static.stepfun.com
16 min
2/19/2026
Recent advancements in foundational models have produced reasoning systems that can achieve gold-medal standards at the International Mathematical Olympiad. Transitioning from competition-level problem-solving to professional research necessitates the ability to navigate extensive literature and construct long-form mathematical arguments.
arxiv.org
2 min
2/15/2026
Large Language Models exhibit a reasoning process aimed at maximizing training rewards rather than establishing truth. This behavior is comparable to a student manipulating calculations to achieve a desired grade despite knowing the final result is incorrect.
tomaszmachnik.pl
2 min
1/25/2026
People increasingly rely on generative artificial intelligence for reasoning, raising questions about the future of human judgment. Tri-System Theory is introduced to extend dual-process accounts of reasoning by adding a third system, System 3.
papers.ssrn.com
2 min
3/21/2026
Step 3.5 Flash is an open-source foundation model designed for advanced reasoning and agentic capabilities. Utilizing a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B parameters per token, enabling high intelligence density and real-time interaction.
static.stepfun.com
16 min
2/19/2026
Large Language Models exhibit a reasoning process aimed at maximizing training rewards rather than establishing truth. This behavior is comparable to a student manipulating calculations to achieve a desired grade despite knowing the final result is incorrect.
tomaszmachnik.pl
2 min
1/25/2026
The car wash test evaluates AI reasoning by asking whether to walk or drive 50 meters to a car wash. Most leading AI models, including Claude Sonnet 4.5, GPT-5.1, Llama, and Mistral, fail to provide the correct answer, which is to drive.
opper.ai
9 min
2/23/2026
Recent advancements in foundational models have produced reasoning systems that can achieve gold-medal standards at the International Mathematical Olympiad. Transitioning from competition-level problem-solving to professional research necessitates the ability to navigate extensive literature and construct long-form mathematical arguments.
arxiv.org
2 min
2/15/2026
People increasingly rely on generative artificial intelligence for reasoning, raising questions about the future of human judgment. Tri-System Theory is introduced to extend dual-process accounts of reasoning by adding a third system, System 3.
papers.ssrn.com
2 min
3/21/2026
Recent advancements in foundational models have produced reasoning systems that can achieve gold-medal standards at the International Mathematical Olympiad. Transitioning from competition-level problem-solving to professional research necessitates the ability to navigate extensive literature and construct long-form mathematical arguments.
arxiv.org
2 min
2/15/2026
The car wash test evaluates AI reasoning by asking whether to walk or drive 50 meters to a car wash. Most leading AI models, including Claude Sonnet 4.5, GPT-5.1, Llama, and Mistral, fail to provide the correct answer, which is to drive.
opper.ai
9 min
2/23/2026
Large Language Models exhibit a reasoning process aimed at maximizing training rewards rather than establishing truth. This behavior is comparable to a student manipulating calculations to achieve a desired grade despite knowing the final result is incorrect.
tomaszmachnik.pl
2 min
1/25/2026
Step 3.5 Flash is an open-source foundation model designed for advanced reasoning and agentic capabilities. Utilizing a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B parameters per token, enabling high intelligence density and real-time interaction.
static.stepfun.com
16 min
2/19/2026
No more articles to load