GPT-4o Realtime (Dec '24) vs Qwen3 4B (Non-reasoning)

OpenAI vs Alibaba — side-by-side benchmark comparison

	GPT-4o Realtime (Dec '24)	Qwen3 4B (Non-reasoning)
Intelligence Index	—	12.5
Coding Index	—	—
Math Index	—	—
Output speed (tok/s)	0.0	103.5
Blended price ($/1M)	$0.00	$0.19
Time to first token (s)	0.00s	1.02s
aime	—	21.3%
aime 25	—	—
artificial analysis coding index	—	—
artificial analysis intelligence index	—	12.50
artificial analysis math index	—	—
gpqa	—	39.8%
hle	—	3.7%
ifbench	—	—
lcr	—	—
livecodebench	—	23.3%
math 500	—	84.3%
mmlu pro	—	58.6%
scicode	—	16.7%
tau2	—	—
terminalbench hard	—	—

Benchmark data from Artificial Analysis.