Grok 4.20 0309 (Non-reasoning) vs Qwen3 235B A22B 2507 (Reasoning)

xAI vs Alibaba — side-by-side benchmark comparison

	Grok 4.20 0309 (Non-reasoning)	Qwen3 235B A22B 2507 (Reasoning)
Intelligence Index	29.7	29.5
Coding Index	25.4	23.2
Math Index	—	91.0
Output speed (tok/s)	202.6	62.5
Blended price ($/1M)	$3.00	$0.84
Time to first token (s)	0.50s	1.21s
aime	—	94.0%
aime 25	—	91.0%
artificial analysis coding index	25.40	23.20
artificial analysis intelligence index	29.70	29.50
artificial analysis math index	—	91.00
gpqa	78.5%	79.0%
hle	22.5%	15.0%
ifbench	47.8%	51.2%
lcr	18.0%	67.0%
livecodebench	—	78.8%
math 500	—	98.4%
mmlu pro	—	84.3%
scicode	32.2%	42.4%
tau2	69.6%	53.2%
terminalbench hard	22.0%	13.6%

Benchmark data from Artificial Analysis.