Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.
letsdatascience.com
3 min
5/6/2026
VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.
github.com
4 min
4/28/2026
Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.
cohere.com
5 min
3/31/2026
NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.
letsdatascience.com
3 min
5/6/2026
Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.
cohere.com
5 min
3/31/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.
github.com
4 min
4/28/2026
NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
Telus is utilizing AI technology from Tomato.ai to modify call-centre agents' accents in real time, aiming to reduce "accent-related friction." Labour groups have criticized this practice, calling it deceptive and advocating for mandatory disclosure.
letsdatascience.com
3 min
5/6/2026
NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.
github.com
4 min
4/28/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
Cohere has launched Transcribe, an open-source automatic speech recognition (ASR) model designed for high accuracy in practical conditions. The model supports various applications, including meeting transcription, speech analytics, and real-time customer support.
cohere.com
5 min
3/31/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
No more articles to load