gpt-oss-120b (high) vs Hermes 4 - Llama-3.1 405B (Reasoning)

OpenAI vs Nous Research — side-by-side benchmark comparison

	gpt-oss-120b (high)	Hermes 4 - Llama-3.1 405B (Reasoning)
Intelligence Index	33.3	18.6
Coding Index	28.6	16.0
Math Index	93.4	69.7
Output speed (tok/s)	356.8	38.6
Blended price ($/1M)	$0.26	$1.50
Time to first token (s)	0.51s	0.79s
aime	—	—
aime 25	93.4%	69.7%
artificial analysis coding index	28.60	16.00
artificial analysis intelligence index	33.30	18.60
artificial analysis math index	93.40	69.70
gpqa	78.2%	72.7%
hle	18.5%	10.3%
ifbench	69.0%	32.7%
lcr	50.7%	20.7%
livecodebench	87.8%	68.6%
math 500	—	—
mmlu pro	80.8%	82.9%
scicode	38.9%	25.2%
tau2	65.8%	22.2%
terminalbench hard	23.5%	11.4%

Benchmark data from Artificial Analysis.