← All comparisons

Hermes 4 - Llama-3.1 70B (Reasoning) vs Qwen3 VL 235B A22B (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 70B (Reasoning)Qwen3 VL 235B A22B (Reasoning)
Intelligence Index16.027.6
Coding Index14.420.9
Math Index68.788.3
Output speed (tok/s)92.835.6
Blended price ($/1M)$0.20$2.17
Time to first token (s)0.64s5.14s
aime
aime 2568.7%88.3%
artificial analysis coding index14.4020.90
artificial analysis intelligence index16.0027.60
artificial analysis math index68.7088.30
gpqa69.9%77.2%
hle7.9%10.1%
ifbench31.3%56.5%
lcr6.7%58.7%
livecodebench65.3%64.6%
math 500
mmlu pro81.1%83.6%
scicode34.1%39.9%
tau222.5%54.1%
terminalbench hard4.5%11.4%

Benchmark data from Artificial Analysis.