Qwen3 235B A22B 2507 Instruct vs Qwen3 4B 2507 (Reasoning)

Alibaba vs Alibaba — side-by-side benchmark comparison

	Qwen3 235B A22B 2507 Instruct	Qwen3 4B 2507 (Reasoning)
Intelligence Index	25.0	18.2
Coding Index	22.1	9.5
Math Index	71.7	82.7
Output speed (tok/s)	57.0	0.0
Blended price ($/1M)	$0.36	$0.00
Time to first token (s)	1.34s	0.00s
aime	71.7%	—
aime 25	71.7%	82.7%
artificial analysis coding index	22.10	9.50
artificial analysis intelligence index	25.00	18.20
artificial analysis math index	71.70	82.70
gpqa	75.3%	66.7%
hle	10.6%	5.9%
ifbench	46.1%	49.8%
lcr	31.2%	37.7%
livecodebench	52.4%	64.1%
math 500	98.0%	—
mmlu pro	82.8%	74.3%
scicode	36.0%	25.6%
tau2	33.3%	25.4%
terminalbench hard	15.2%	1.5%

Benchmark data from Artificial Analysis.