NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
NVIDIA PersonaPlex 7B enables full-duplex speech-to-speech communication on Apple Silicon, allowing simultaneous listening and speaking. The qwen3-asr-swift library processes audio in real-time, streaming generated audio chunks without a multi-step pipeline.
blog.ivan.digital
5 min
3/5/2026
The GitHub repository provides a pure C implementation of the inference pipeline for Mistral AI's Voxtral Realtime 4B speech-to-text model, requiring only the C standard library. It features fast MPS inference, a chunked audio processing encoder to manage memory usage, and supports audio input from stdin or live microphone capture.
github.com
9 min
2/10/2026
Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.
github.com
5 min
2/27/2026
Voxtral Transcribe 2 features two advanced speech-to-text models, Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, offering state-of-the-art transcription quality and ultra-low latency. Voxtral Realtime is available as open-weights under the Apache 2.0 license.
mistral.ai
5 min
2/4/2026
Voxtral Mini Realtime is a streaming speech recognition model implemented in pure Rust, utilizing the Burn ML framework. It operates natively in the browser via WASM and WebGPU, with a Q4 GGUF quantized version available for client-side execution.
github.com
4 min
2/10/2026
No more articles to load