Grok 4.20 0309 (Non-reasoning) vs Qwen3 VL 4B Instruct

xAI vs Alibaba — side-by-side benchmark comparison

	Grok 4.20 0309 (Non-reasoning)	Qwen3 VL 4B Instruct
Intelligence Index	29.7	9.6
Coding Index	25.4	4.6
Math Index	—	37.0
Output speed (tok/s)	202.6	0.0
Blended price ($/1M)	$3.00	$0.00
Time to first token (s)	0.50s	0.00s
aime	—	—
aime 25	—	37.0%
artificial analysis coding index	25.40	4.60
artificial analysis intelligence index	29.70	9.60
artificial analysis math index	—	37.00
gpqa	78.5%	37.1%
hle	22.5%	3.7%
ifbench	47.8%	31.8%
lcr	18.0%	13.0%
livecodebench	—	29.0%
math 500	—	—
mmlu pro	—	63.4%
scicode	32.2%	13.7%
tau2	69.6%	23.4%
terminalbench hard	22.0%	0.0%

Benchmark data from Artificial Analysis.