omi-health/omi-med-stt-v1-gguf overview
Omi Med STT v1 GGUF GGUF export of Omi Med STT v1 https://huggingface.co/omi health/omi med stt v1 for Linux and Windows CPU use through the omi med stt CLI. T…
Runs locally from ~886.2 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | omi-health/omi-med-stt-v1-gguf |
|---|---|
| Author | omi-health |
| Pipeline | automatic-speech-recognition |
| License | cc-by-4.0 |
| Base model | nvidia/parakeet-tdt-0.6b-v2 |
| Last modified | 2026-06-09T00:37:54.000Z |
Model README
---
license: cc-by-4.0
language:
- en
library_name: gguf
tags:
- automatic-speech-recognition
- medical
- parakeet
- gguf
- parakeet.cpp
- omi-med-stt
pipeline_tag: automatic-speech-recognition
base_model: nvidia/parakeet-tdt-0.6b-v2
---
Omi Med STT v1 GGUF
GGUF export of Omi Med STT v1
for Linux and Windows CPU use through the omi-med-stt CLI.
This is the portability path. If you have Apple Silicon, use the MLX q8 repo. If
you have an NVIDIA GPU, use the canonical NeMo checkpoint.
Quickstart
pip install -U omi-med-stt
omi-med-stt install-cpp --cpp-backend cpu
omi-med-stt audio.wav --runtime cpp
Files
| File | Status |
|---|---|
| omi-med-stt-v1-q8_0.gguf | Default CPU artifact, benchmarked |
| omi-med-stt-v1-f16.gguf | Provided for conversion/experimentation; not independently benchmarked |
Evaluation
Full evaluation details: omi.health/research/omi-med-stt.
Benchmark: 7.18h of real and synthetic clinical speech across dialogue, dictation, medication review, procedures/devices/tests, and general speech. Speed is shown as time to process one hour of audio; lower is faster.
NeMo vs Open / Local Models
Local GPU baselines were run on A10 where applicable; VibeVoice-ASR 9B used H100.
| Model | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
|---|---:|---:|---:|---:|---:|
| VibeVoice-ASR 9B | 11.10% | 1.78% | 1.36% | 98.71% | 5m 20s (11.2x) |
| Omi Med STT v1 NeMo | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) |
| Qwen3 ASR 1.7B | 10.72% | 3.13% | 6.11% | 97.21% | 44s (81.1x) |
| Whisper Large v3 Turbo (A10) | 11.98% | 3.93% | 5.88% | 96.45% | 1m 19s (45.8x) |
| Cohere Transcribe 03-2026 | 14.88% | 5.05% | 11.09% | 95.16% | 25s (146.3x) |
| Parakeet TDT 0.6B v3 | 15.26% | 8.01% | 9.50% | 96.34% | 23s (157.9x) |
| Parakeet TDT 0.6B v2 base | 16.45% | 8.36% | 8.60% | 96.20% | 23s (153.8x) |
Runtime Artifacts
Same internal evaluation as the canonical checkpoint.
| Artifact | WER | M-WER | Drug M-WER | Medical Recall | Speed: time / 1 hour audio (formula-derived x realtime) |
|---|---:|---:|---:|---:|---:|
| NeMo canonical | 8.30% | 2.37% | 4.75% | 97.95% | 25s (146.3x) |
| MLX q8 | 8.61% | 2.75% | 5.20% | 97.63% | 53s (67.4x) |
| GGUF q8_0 | 9.12% | 3.20% | 6.33% | 97.53% | 2m 53s (20.8x) |
The GGUF q8_0 build is useful when CPU portability matters. It is not the
quality-leading artifact.
Compatibility
These files are not llama.cpp text-model GGUF files. They require a Parakeet
ASR runtime. The supported path is:
omi-med-stt audio.wav --runtime cpp
The CLI installs the patched parakeet.cpp runtime needed for Omi Med STT v1.
Links
- Canonical model:
omi-health/omi-med-stt-v1 - Mac q8 default:
omi-health/omi-med-stt-v1-mlx-q8 - Runtime CLI:
Omi-Health/omi-med-stt-runtime - Broader evaluation and product context: omi.health/research/omi-med-stt
- parakeet.cpp:
mudler/parakeet.cpp
Safety
Omi Med STT v1 is speech-to-text only. It is not a diagnostic, triage,
prescribing, or clinical decision model, and it is not clinically validated.
Transcripts must be reviewed before any clinical use.
Run omi-health/omi-med-stt-v1-gguf with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models