Grok 4.3 (medium) vs DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)

xAI vs Nous Research — side-by-side benchmark comparison

	Grok 4.3 (medium)	DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Intelligence Index	48.8	7.6
Coding Index	35.1	—
Math Index	—	—
Output speed (tok/s)	112.5	0.0
Blended price ($/1M)	$1.56	$0.00
Time to first token (s)	17.68s	0.00s
aime	—	0.0%
aime 25	—	—
artificial analysis coding index	35.10	—
artificial analysis intelligence index	48.80	7.60
artificial analysis math index	—	—
gpqa	89.0%	27.0%
hle	28.1%	4.3%
ifbench	83.3%	—
lcr	65.0%	—
livecodebench	—	8.5%
math 500	—	21.8%
mmlu pro	—	36.5%
scicode	44.6%	9.1%
tau2	91.2%	—
terminalbench hard	30.3%	—

Benchmark data from Artificial Analysis.