← All comparisons

Hermes 4 - Llama-3.1 405B (Reasoning) vs Qwen3.5 4B (Non-reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 405B (Reasoning)Qwen3.5 4B (Non-reasoning)
Intelligence Index18.622.6
Coding Index16.013.7
Math Index69.7
Output speed (tok/s)38.6210.0
Blended price ($/1M)$1.50$0.06
Time to first token (s)0.79s0.23s
aime
aime 2569.7%
artificial analysis coding index16.0013.70
artificial analysis intelligence index18.6022.60
artificial analysis math index69.70
gpqa72.7%71.2%
hle10.3%7.5%
ifbench32.7%33.3%
lcr20.7%28.3%
livecodebench68.6%
math 500
mmlu pro82.9%
scicode25.2%18.3%
tau222.2%87.7%
terminalbench hard11.4%11.4%

Benchmark data from Artificial Analysis.