Comparing ASR Solutions

Date: 2025-06-10

To create a cost-effective transcription project, I researched the common Automatic Speech Recognition (ASR) solutions available on the market. They are:

Whisper API from OpenAI
gpt-4o-mini-transcribe and gpt-4o-transcribe from OpenAI
Microsoft Azure STT services
Cloudflare AI
whisper and whisper-large-v3-turbo by Cloudflare AI
Self hosted whisper-asr-webservice. On Ryzen 5 3600 and Ampere 4C. ( I should really get an Nvidia graphics card. )
Whisper v3 by lemonfox.ai

Price

Service	Cost per Minute (USD)	Notes
Selfhost	$0.00	Free, hardware cost not included
Cloudflare whisper-large-v3-turbo	$0.00051	Free for ~4.5 hours/day (10,000 neurons)
lemonfox.ai	$0.00278
Azure	$0.003	Free for 5 audio hours/month
gpt-4o-mini-transcribe	$0.003
gpt-4o-transcribe	$0.006
OpenAI Whisper1	$0.006

Quality

In order to compare the quality and speed of the solution, I ran this sample auido (2:57) in postman for all the solutions, and the results are here.

I chose this sample audio because I encountered trouble with Cloudflare's Whisper-V3. It usually performs the same or a bit worse than other V3 models.

Reason:

Test ability to recognize language (Yue and Zh)
Test robustness, the speech is not very clear and the model can make corrections to it.

All transcription are done on Postman, to ensure same environment across all services.

Rank	ASR Service	Quality	Comments
0	OpenAI - gpt-4o-mini-transcribe with prompt	Excellent	✅ Minor Error (2) ❌ Output Cantonese
2	OpenAI - gpt-4o-mini-transcribe	Excellent	✅ Output Cantonese ❌ No punctuation, not even space ❌ Minor errors (2)
1	OpenAI - gpt-4o-transcribe	Excellent	✅ Have punctuation ❌ Doesn't output original language ❌ Minor Error(4)
3	Azure Transcriptions	Very Good	✅ Have punctuation ❌ Simplified Chinese Only ❌ Minor Error(4)
3	LemonFox	Very Good	❌ No punctuation ❌ Written Chinese ❌ Minor Error (4)
4	OpenAI - Whisper-1	Good	❌ No punctuation ❌ Minor error(4)
5	Self-hosted, large, faster-whisper	Good	❌ No punctuation ❌ Hallucination ❌ similar to Whisper-1, minor errors
6	Self-hosted, base, faster-whisper	Fair	❌ More errors from word with similar pronunciation
7	Cloudflare - whisper-large-v3-turbo	Poor	❌ Simplifies Chinese ❌ No punctuation ❌ Some what non-sense
8	Cloudflare - whisper	Unusable	❌ nonsensical and garbled, Inf loop of "我們去飲品" ❌ 2MB imit

Rank (by Speed)	ASR Service	Run Time
1	LemonFox	6.08s
2	OpenAI - gpt-4o-transcribe	7.76s
3	OpenAI - Whisper-1	7.77s
4	OpenAI - gpt-4o-mini-transcribe	8.42s
5	Azure Transcriptions - Transcribe	8.54s
6	Cloudflare - whisper-large-v3-turbo	8.79s
7	Self-hosted, base, faster-whisper	16.61s
8	Cloudflare - whisper	17.8s
9	Self-hosted, large, faster-whisper	5m 13.62s

Speed

Since the run time of previous samples are too close, I will run a 20mins sample to test the run time of each ASR services. The Sample I use is : Will legal challenges end the trade war? (16:15)

The results are in Sample 2.

Rank (by Speed)	ASR Service	Run Time
1	LemonFox	17.27s
2	Azure Transcriptions - Transcribe	22.49s
3	OpenAI - gpt-4o-mini-transcribe	27.41s
4	OpenAI - Whisper-1	36.48s
5	OpenAI - gpt-4o-transcribe	36.57s
6	Cloudflare - whisper-large-v3-turbo	44.44s
7	Self-hosted, base, faster-whisper	1m37.62s
X	Cloudflare - whisper	Skipped
X	Self-hosted, large, faster-whisper	Skipped

Ease of Use / Functionality

Services	Model	vtt/srt	WordTimeStamp	SizeLimit	Other
OpenAI	gpt-4o-transcribe	✅	✅	25MB	1. Realtime transcription 2. Auto Chunking by VAD
OpenAI	gpt-4o-mini-transcribe	✅	✅	25MB	1. Realtime transcription 2. Auto Chunking
OpenAI	Whisper-1	✅	✅	25MB	1. Realtime transcription 2. Auto Chunking
Azure	N/A	❌	✅	2hr / 250MB	speaker diarization
Cloudflare	whisper-large-v3-turbo	✅	✅	N/A
Cloudflare	whisper	✅	✅	2MB Chunk
Selfhost	customizable	✅	✅	N/A
Lemonfox	Whisper large-v3	✅	✅	N/A	speaker diarization

Supported formats

Services	Model	Supported Format
OpenAI	gpt-4o-transcribe	flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
OpenAI	gpt-4o-mini-transcribe	flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
OpenAI	Whisper-1	flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Azure	N/A	not mentioned
Cloudflare	whisper-large-v3-turbo	not mentioned
Cloudflare	whisper	not mentioned
Selfhost	customizable	mp3 / steamtables. Doesn't support m4a in test.
Lemonfox	Whisper large-v3	`mp3`, `wav`, `flac`, `aac`, `opus`, `ogg`, `m4a`, `mp4`, `mpeg`, `mov`, `webm`, and more.

Comparing ASR Solutions

Quick Links