← All comparisons

Claude Sonnet 4.6 (Non-reasoning, Low Effort) vs Hermes 4 - Llama-3.1 405B (Reasoning)

Anthropic vs Nous Research — side-by-side benchmark comparison

Claude Sonnet 4.6 (Non-reasoning, Low Effort)Hermes 4 - Llama-3.1 405B (Reasoning)
Intelligence Index42.618.6
Coding Index43.016.0
Math Index69.7
Output speed (tok/s)54.938.6
Blended price ($/1M)$6.56$1.50
Time to first token (s)1.13s0.79s
aime
aime 2569.7%
artificial analysis coding index43.0016.00
artificial analysis intelligence index42.6018.60
artificial analysis math index69.70
gpqa79.7%72.7%
hle10.8%10.3%
ifbench42.4%32.7%
lcr58.7%20.7%
livecodebench68.6%
math 500
mmlu pro82.9%
scicode44.1%25.2%
tau278.9%22.2%
terminalbench hard42.4%11.4%

Benchmark data from Artificial Analysis.