← All comparisons

Hermes 4 - Llama-3.1 70B (Non-reasoning) vs Qwen3 235B A22B 2507 (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Non-reasoning)Qwen3 235B A22B 2507 (Reasoning)
Intelligence Index12.629.5
Coding Index9.223.2
Math Index11.391.0
Output speed (tok/s)94.362.5
Blended price ($/1M)$0.20$0.84
Time to first token (s)0.61s1.21s
aime94.0%
aime 2511.3%91.0%
artificial analysis coding index9.2023.20
artificial analysis intelligence index12.6029.50
artificial analysis math index11.3091.00
gpqa49.1%79.0%
hle3.6%15.0%
ifbench29.0%51.2%
lcr2.0%67.0%
livecodebench26.9%78.8%
math 50098.4%
mmlu pro66.4%84.3%
scicode27.7%42.4%
tau221.6%53.2%
terminalbench hard0.0%13.6%

Benchmark data from Artificial Analysis.