← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Qwen3 4B 2507 (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Qwen3 4B 2507 (Reasoning)
Intelligence Index16.018.2
Coding Index14.49.5
Math Index68.782.7
Output speed (tok/s)92.80.0
Blended price ($/1M)$0.20$0.00
Time to first token (s)0.64s0.00s
aime
aime 2568.7%82.7%
artificial analysis coding index14.409.50
artificial analysis intelligence index16.0018.20
artificial analysis math index68.7082.70
gpqa69.9%66.7%
hle7.9%5.9%
ifbench31.3%49.8%
lcr6.7%37.7%
livecodebench65.3%64.1%
math 500
mmlu pro81.1%74.3%
scicode34.1%25.6%
tau222.5%25.4%
terminalbench hard4.5%1.5%

Benchmark data from Artificial Analysis.