
github.com
April 28, 2026
4 min read
65/100
Summary
VibeVoice ASR is an open-source speech-to-text model that processes 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identification, timestamps, and content. It is now integrated into the Hugging Face Transformers library for easy project implementation.
Key Takeaways
Community Sentiment
Positives
Concerns

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift
Mar 5, 2026

Voice-AI-for-Beginners – A curated learning path for developers
May 2, 2026
Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model
Feb 10, 2026

Voxtral Transcribe 2
Feb 4, 2026

Cohere Transcribe: Speech Recognition
Mar 31, 2026