← All comparisons

Hermes 4 - Llama-3.1 405B (Reasoning) vs Qwen3 4B 2507 (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 405B (Reasoning)Qwen3 4B 2507 (Reasoning)
Intelligence Index18.618.2
Coding Index16.09.5
Math Index69.782.7
Output speed (tok/s)38.60.0
Blended price ($/1M)$1.50$0.00
Time to first token (s)0.79s0.00s
aime
aime 2569.7%82.7%
artificial analysis coding index16.009.50
artificial analysis intelligence index18.6018.20
artificial analysis math index69.7082.70
gpqa72.7%66.7%
hle10.3%5.9%
ifbench32.7%49.8%
lcr20.7%37.7%
livecodebench68.6%64.1%
math 500
mmlu pro82.9%74.3%
scicode25.2%25.6%
tau222.2%25.4%
terminalbench hard11.4%1.5%

Benchmark data from Artificial Analysis.