← All comparisons

Grok 4.20 0309 v2 (Non-reasoning) vs Qwen3 4B 2507 (Reasoning)

xAI vs Alibaba — side-by-side benchmark comparison

Grok 4.20 0309 v2 (Non-reasoning)Qwen3 4B 2507 (Reasoning)
Intelligence Index29.018.2
Coding Index22.09.5
Math Index82.7
Output speed (tok/s)175.20.0
Blended price ($/1M)$3.00$0.00
Time to first token (s)0.47s0.00s
aime
aime 2582.7%
artificial analysis coding index22.009.50
artificial analysis intelligence index29.0018.20
artificial analysis math index82.70
gpqa77.6%66.7%
hle24.2%5.9%
ifbench49.3%49.8%
lcr17.3%37.7%
livecodebench64.1%
math 500
mmlu pro74.3%
scicode32.8%25.6%
tau259.9%25.4%
terminalbench hard16.7%1.5%

Benchmark data from Artificial Analysis.