Qwen3.5 2B (Non-reasoning) vs GPT-4o (ChatGPT)

Alibaba vs OpenAI — side-by-side benchmark comparison

	Qwen3.5 2B (Non-reasoning)	GPT-4o (ChatGPT)
Intelligence Index	14.7	14.1
Coding Index	4.9	—
Math Index	—	—
Output speed (tok/s)	272.0	0.0
Blended price ($/1M)	$0.04	$0.00
Time to first token (s)	0.27s	0.00s
aime	—	10.3%
aime 25	—	—
artificial analysis coding index	4.90	—
artificial analysis intelligence index	14.70	14.10
artificial analysis math index	—	—
gpqa	43.8%	51.1%
hle	4.9%	3.7%
ifbench	29.1%	—
lcr	13.7%	53.0%
livecodebench	—	—
math 500	—	79.7%
mmlu pro	—	77.3%
scicode	7.2%	33.4%
tau2	81.6%	—
terminalbench hard	3.8%	—

Benchmark data from Artificial Analysis.