← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Qwen3 235B A22B 2507 (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Qwen3 235B A22B 2507 (Reasoning)
Intelligence Index16.029.5
Coding Index14.423.2
Math Index68.791.0
Output speed (tok/s)92.862.5
Blended price ($/1M)$0.20$0.84
Time to first token (s)0.64s1.21s
aime94.0%
aime 2568.7%91.0%
artificial analysis coding index14.4023.20
artificial analysis intelligence index16.0029.50
artificial analysis math index68.7091.00
gpqa69.9%79.0%
hle7.9%15.0%
ifbench31.3%51.2%
lcr6.7%67.0%
livecodebench65.3%78.8%
math 50098.4%
mmlu pro81.1%84.3%
scicode34.1%42.4%
tau222.5%53.2%
terminalbench hard4.5%13.6%

Benchmark data from Artificial Analysis.