← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Claude Opus 4.6 (Adaptive Reasoning, Max Effort)

Nous Research vs Anthropic — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Intelligence Index16.052.9
Coding Index14.448.1
Math Index68.7
Output speed (tok/s)92.854.8
Blended price ($/1M)$0.20$10.94
Time to first token (s)0.64s11.69s
aime
aime 2568.7%
artificial analysis coding index14.4048.10
artificial analysis intelligence index16.0052.90
artificial analysis math index68.70
gpqa69.9%89.6%
hle7.9%36.7%
ifbench31.3%53.1%
lcr6.7%70.7%
livecodebench65.3%
math 500
mmlu pro81.1%
scicode34.1%51.9%
tau222.5%92.1%
terminalbench hard4.5%46.2%

Benchmark data from Artificial Analysis.