Qwen3.5 0.8B (Non-reasoning) vs Grok 4.20 0309 (Reasoning)

Alibaba vs xAI — side-by-side benchmark comparison

	Qwen3.5 0.8B (Non-reasoning)	Grok 4.20 0309 (Reasoning)
Intelligence Index	9.9	48.5
Coding Index	1.0	42.2
Math Index	—	—
Output speed (tok/s)	96.3	217.8
Blended price ($/1M)	$0.02	$3.00
Time to first token (s)	0.26s	13.18s
aime	—	—
aime 25	—	—
artificial analysis coding index	100.0%	42.20
artificial analysis intelligence index	9.90	48.50
artificial analysis math index	—	—
gpqa	23.6%	88.5%
hle	4.9%	30.0%
ifbench	21.6%	82.9%
lcr	6.7%	59.0%
livecodebench	—	—
math 500	—	—
mmlu pro	—	—
scicode	2.9%	44.7%
tau2	65.2%	96.5%
terminalbench hard	0.0%	40.9%

Benchmark data from Artificial Analysis.