Hermes 4 - Llama-3.1 405B (Non-reasoning) vs GPT-4o (Aug '24)

Nous Research vs OpenAI — side-by-side benchmark comparison

	Hermes 4 - Llama-3.1 405B (Non-reasoning)	GPT-4o (Aug '24)
Intelligence Index	17.6	18.6
Coding Index	18.1	16.6
Math Index	15.3	—
Output speed (tok/s)	40.8	117.5
Blended price ($/1M)	$1.50	$4.38
Time to first token (s)	0.73s	0.60s
aime	—	11.7%
aime 25	15.3%	—
artificial analysis coding index	18.10	16.60
artificial analysis intelligence index	17.60	18.60
artificial analysis math index	15.30	—
gpqa	53.6%	52.1%
hle	4.2%	2.9%
ifbench	34.8%	36.0%
lcr	20.0%	35.0%
livecodebench	54.6%	31.7%
math 500	—	79.5%
mmlu pro	72.9%	—
scicode	34.6%	33.1%
tau2	26.6%	28.9%
terminalbench hard	9.8%	8.3%

Benchmark data from Artificial Analysis.