Olmo 3 7B Think vs Hermes 4 - Llama-3.1 405B (Reasoning)

Allen Institute for AI vs Nous Research — side-by-side benchmark comparison

	Olmo 3 7B Think	Hermes 4 - Llama-3.1 405B (Reasoning)
Intelligence Index	9.4	18.6
Coding Index	7.6	16.0
Math Index	70.7	69.7
Output speed (tok/s)	0.0	38.6
Blended price ($/1M)	$0.00	$1.50
Time to first token (s)	0.00s	0.79s
aime	—	—
aime 25	70.7%	69.7%
artificial analysis coding index	7.60	16.00
artificial analysis intelligence index	9.40	18.60
artificial analysis math index	70.70	69.70
gpqa	51.6%	72.7%
hle	5.7%	10.3%
ifbench	41.5%	32.7%
lcr	0.0%	20.7%
livecodebench	61.7%	68.6%
math 500	—	—
mmlu pro	65.5%	82.9%
scicode	21.2%	25.2%
tau2	0.0%	22.2%
terminalbench hard	0.8%	11.4%

Benchmark data from Artificial Analysis.