Qwen3.5 2B (Non-reasoning) vs o4-mini (high)

Alibaba vs OpenAI — side-by-side benchmark comparison

	Qwen3.5 2B (Non-reasoning)	o4-mini (high)
Intelligence Index	14.7	33.1
Coding Index	4.9	25.6
Math Index	—	90.7
Output speed (tok/s)	272.0	160.5
Blended price ($/1M)	$0.04	$1.93
Time to first token (s)	0.27s	23.07s
aime	—	94.0%
aime 25	—	90.7%
artificial analysis coding index	4.90	25.60
artificial analysis intelligence index	14.70	33.10
artificial analysis math index	—	90.70
gpqa	43.8%	78.4%
hle	4.9%	17.5%
ifbench	29.1%	68.7%
lcr	13.7%	55.0%
livecodebench	—	85.9%
math 500	—	98.9%
mmlu pro	—	83.2%
scicode	7.2%	46.5%
tau2	81.6%	55.6%
terminalbench hard	3.8%	15.2%

Benchmark data from Artificial Analysis.