← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Claude 3.5 Sonnet (June '24)

Nous Research vs Anthropic — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Claude 3.5 Sonnet (June '24)
Intelligence Index16.014.2
Coding Index14.426.0
Math Index68.7
Output speed (tok/s)92.80.0
Blended price ($/1M)$0.20$6.56
Time to first token (s)0.64s0.00s
aime9.7%
aime 2568.7%
artificial analysis coding index14.4026.00
artificial analysis intelligence index16.0014.20
artificial analysis math index68.70
gpqa69.9%56.0%
hle7.9%3.7%
ifbench31.3%
lcr6.7%
livecodebench65.3%
math 50069.5%
mmlu pro81.1%75.1%
scicode34.1%31.6%
tau222.5%
terminalbench hard4.5%

Benchmark data from Artificial Analysis.