Grok 4.20 0309 v2 (Non-reasoning) vs Llama 3.1 Tulu3 405B

xAI vs Allen Institute for AI — side-by-side benchmark comparison

	Grok 4.20 0309 v2 (Non-reasoning)	Llama 3.1 Tulu3 405B
Intelligence Index	29.0	14.1
Coding Index	22.0	—
Math Index	—	—
Output speed (tok/s)	175.2	0.0
Blended price ($/1M)	$3.00	$0.00
Time to first token (s)	0.47s	0.00s
aime	—	13.3%
aime 25	—	—
artificial analysis coding index	22.00	—
artificial analysis intelligence index	29.00	14.10
artificial analysis math index	—	—
gpqa	77.6%	51.6%
hle	24.2%	3.5%
ifbench	49.3%	—
lcr	17.3%	—
livecodebench	—	29.1%
math 500	—	77.8%
mmlu pro	—	71.6%
scicode	32.8%	30.2%
tau2	59.9%	—
terminalbench hard	16.7%	—

Benchmark data from Artificial Analysis.