← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Qwen3.5 0.8B (Non-reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Qwen3.5 0.8B (Non-reasoning)
Intelligence Index16.09.9
Coding Index14.41.0
Math Index68.7
Output speed (tok/s)92.896.3
Blended price ($/1M)$0.20$0.02
Time to first token (s)0.64s0.26s
aime
aime 2568.7%
artificial analysis coding index14.40100.0%
artificial analysis intelligence index16.009.90
artificial analysis math index68.70
gpqa69.9%23.6%
hle7.9%4.9%
ifbench31.3%21.6%
lcr6.7%6.7%
livecodebench65.3%
math 500
mmlu pro81.1%
scicode34.1%2.9%
tau222.5%65.2%
terminalbench hard4.5%0.0%

Benchmark data from Artificial Analysis.