Themata.AI
Themata.AI

Popular tags:

#developer-tools#ai-agents#llms#claude#ai-ethics#code-generation#openai#ai-safety#anthropic#open-source

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Β© 2026 Themata.AI β€’ All Rights Reserved

Privacy

|

Cookies

|

Contact
πŸ•’ LatestπŸ”₯ Top
WeekMonthYearAll Time

Filtering by tag:

speech-recognitionClear
Telus Uses AI to Alter Call-Agent Accents
ai-agentsspeech-recognitiontelecommunicationsethical-ai
News

Telus Uses AI to Alter Call-Agent Accents

Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.

letsdatascience.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

3 min

5/6/2026

GitHub - microsoft/VibeVoice: Open-Source Frontier Voice AITool

Microsoft VibeVoice: Open-Source Frontier Voice AI

VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

4/28/2026

Cohere Transcribe: Speech Recognition

Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.

cohere.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/31/2026

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.

blog.ivan.digital

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/5/2026

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/27/2026

Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/10/2026

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/10/2026

Voxtral Transcribe 2

Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.

mistral.ai

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/4/2026

Telus Uses AI to Alter Call-Agent Accents

Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.

letsdatascience.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

3 min

5/6/2026

Cohere Transcribe: Speech Recognition

Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.

cohere.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/31/2026

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/27/2026

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/10/2026

Microsoft VibeVoice: Open-Source Frontier Voice AI

VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

4/28/2026

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.

blog.ivan.digital

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/5/2026

Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/10/2026

Voxtral Transcribe 2

Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.

mistral.ai

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/4/2026

Telus Uses AI to Alter Call-Agent Accents

Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.

letsdatascience.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

3 min

5/6/2026

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.

blog.ivan.digital

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/5/2026

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

9 min

2/10/2026

Microsoft VibeVoice: Open-Source Frontier Voice AI

VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

4/28/2026

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/27/2026

Voxtral Transcribe 2

Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.

mistral.ai

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

2/4/2026

Cohere Transcribe: Speech Recognition

Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.

cohere.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

5 min

3/31/2026

Rust implementation of Mistral's Voxtral Mini 4B Realtime runs in your browser

Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.

github.com

πŸ”₯πŸ”₯πŸ”₯πŸ”₯πŸ”₯

4 min

2/10/2026

No more articles to load