Grok 4.20 0309 (Non-reasoning) vs Qwen3 VL 8B Instruct

xAI vs Alibaba — side-by-side benchmark comparison

	Grok 4.20 0309 (Non-reasoning)	Qwen3 VL 8B Instruct
Intelligence Index	29.7	14.3
Coding Index	25.4	7.3
Math Index	—	27.3
Output speed (tok/s)	202.6	143.8
Blended price ($/1M)	$3.00	$0.31
Time to first token (s)	0.50s	0.93s
aime	—	—
aime 25	—	27.3%
artificial analysis coding index	25.40	7.30
artificial analysis intelligence index	29.70	14.30
artificial analysis math index	—	27.30
gpqa	78.5%	42.7%
hle	22.5%	2.9%
ifbench	47.8%	32.3%
lcr	18.0%	15.3%
livecodebench	—	33.2%
math 500	—	—
mmlu pro	—	68.6%
scicode	32.2%	17.4%
tau2	69.6%	29.2%
terminalbench hard	22.0%	2.3%

Benchmark data from Artificial Analysis.