OpenAI and Broadcom have unveiled a first-generation LLM-optimized inference chip designed to deliver significantly better performance per watt than current state-of-the-art options. The chip, developed in nine months, is built for current and future large language models and will be deployed at gigawatt scale with data center partners.
openai.com
5 min
3d ago
The AI Compute Extensions (ACE) specification introduces x86 extensions designed to accelerate computation tasks, particularly for matrix multiplication kernels and reduced precision data formats relevant to machine learning workloads. ACE defines new matrix multiplication primitives that enhance AVX and scalar code with features like ACE register state, data processing operations utilizing AVX input, and data move operations for managing tile register state.
x86ecosystem.org
1 min
6/18/2026
Taalas has released an ASIC chip that runs Llama 3.1 8B with an inference rate of 17,000 tokens per second, equivalent to writing approximately 30 A4-sized pages in one second. The chip is claimed to be 10 times cheaper in ownership costs and 10 times more energy-efficient than GPU-based inference systems, while also being 10 times faster than current state-of-the-art inference solutions.
anuragk.com
4 min
2/21/2026
OpenAI and Broadcom have unveiled a first-generation LLM-optimized inference chip designed to deliver significantly better performance per watt than current state-of-the-art options. The chip, developed in nine months, is built for current and future large language models and will be deployed at gigawatt scale with data center partners.
openai.com
5 min
3d ago
Taalas has released an ASIC chip that runs Llama 3.1 8B with an inference rate of 17,000 tokens per second, equivalent to writing approximately 30 A4-sized pages in one second. The chip is claimed to be 10 times cheaper in ownership costs and 10 times more energy-efficient than GPU-based inference systems, while also being 10 times faster than current state-of-the-art inference solutions.
anuragk.com
4 min
2/21/2026
The AI Compute Extensions (ACE) specification introduces x86 extensions designed to accelerate computation tasks, particularly for matrix multiplication kernels and reduced precision data formats relevant to machine learning workloads. ACE defines new matrix multiplication primitives that enhance AVX and scalar code with features like ACE register state, data processing operations utilizing AVX input, and data move operations for managing tile register state.
x86ecosystem.org
1 min
6/18/2026
OpenAI and Broadcom have unveiled a first-generation LLM-optimized inference chip designed to deliver significantly better performance per watt than current state-of-the-art options. The chip, developed in nine months, is built for current and future large language models and will be deployed at gigawatt scale with data center partners.
openai.com
5 min
3d ago
The AI Compute Extensions (ACE) specification introduces x86 extensions designed to accelerate computation tasks, particularly for matrix multiplication kernels and reduced precision data formats relevant to machine learning workloads. ACE defines new matrix multiplication primitives that enhance AVX and scalar code with features like ACE register state, data processing operations utilizing AVX input, and data move operations for managing tile register state.
x86ecosystem.org
1 min
6/18/2026
Taalas has released an ASIC chip that runs Llama 3.1 8B with an inference rate of 17,000 tokens per second, equivalent to writing approximately 30 A4-sized pages in one second. The chip is claimed to be 10 times cheaper in ownership costs and 10 times more energy-efficient than GPU-based inference systems, while also being 10 times faster than current state-of-the-art inference solutions.
anuragk.com
4 min
2/21/2026
No more articles to load