← All comparisons

gpt-oss-120b (high) vs Hermes 4 - Llama-3.1 405B (Reasoning)

OpenAI vs Nous Research — side-by-side benchmark comparison

gpt-oss-120b (high)Hermes 4 - Llama-3.1 405B (Reasoning)
Intelligence Index33.318.6
Coding Index28.616.0
Math Index93.469.7
Output speed (tok/s)356.838.6
Blended price ($/1M)$0.26$1.50
Time to first token (s)0.51s0.79s
aime
aime 2593.4%69.7%
artificial analysis coding index28.6016.00
artificial analysis intelligence index33.3018.60
artificial analysis math index93.4069.70
gpqa78.2%72.7%
hle18.5%10.3%
ifbench69.0%32.7%
lcr50.7%20.7%
livecodebench87.8%68.6%
math 500
mmlu pro80.8%82.9%
scicode38.9%25.2%
tau265.8%22.2%
terminalbench hard23.5%11.4%

Benchmark data from Artificial Analysis.