Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

Themata.AI

AI is changing the world. Don't stay behind. Clear summaries, community insight, delivered without the noise. Subscribe to never miss a beat.

Privacy

Contact

Back to all news

speech-recognition nvidia developer-tools

Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration

GitHub - Frikallo/parakeet.cpp: Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory and Cuda support

github.com

February 27, 2026

5 min read

Summary

Frikallo/parakeet.cpp provides an ultra-fast and portable implementation of NVIDIA's Parakeet models for on-device speech recognition in C++. It achieves approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio, making it 96 times faster than CPU processing, and utilizes the Axiom tensor library for automatic Metal GPU acceleration without heavy dependencies.

Key Takeaways

The Parakeet implementation in C++ enables fast speech recognition with NVIDIA's models, achieving approximately 27ms encoder inference on Apple Silicon GPUs for 10 seconds of audio using a 110M model.
The framework utilizes Axiom, a lightweight tensor library, with automatic Metal GPU acceleration, eliminating the need for ONNX or Python runtimes.
Parakeet supports various models for different tasks, including offline and streaming speech recognition, multilingual capabilities, and speaker diarization for up to four speakers.
The system processes audio through a consistent pipeline, converting 16kHz mono WAV files into 80-bin Mel spectrograms for efficient transcription.

Community Sentiment

Mixed

Positives

The ability to run multiple model families for various tasks like offline transcription and speaker diarization showcases the versatility of the Parakeet ASR engine, which is crucial for diverse applications.
Local inference capabilities in C++ are becoming more accessible, enabling developers to create efficient pipelines for new models quickly, which can significantly enhance the development of speech recognition technologies.
Users report being impressed with the performance of Parakeet on both Windows and Mac, indicating its reliability and effectiveness across platforms.

Concerns

Concerns about achieving low latency under 100ms for speech recognition persist, which is critical for real-time applications and user experience.
Some users suggest that using CoreML with the Apple Neural Engine may provide better speed and power efficiency compared to Metal, raising questions about the optimal platform for ASR inference.

Read original article

Source

github.com

Published

February 27, 2026

Reading Time

5 minutes

Relevance Score

51/100

🔥🔥🔥🔥🔥

Why It Matters

This page is optimized for focused reading: quick context up top, a clean summary block, and a direct path to the original source when you want the full story.