Hermes 3 - Llama-3.1 70B vs Qwen3 235B A22B 2507 Instruct

Nous Research vs Alibaba — side-by-side benchmark comparison

	Hermes 3 - Llama-3.1 70B	Qwen3 235B A22B 2507 Instruct
Intelligence Index	10.6	25.0
Coding Index	—	22.1
Math Index	—	71.7
Output speed (tok/s)	33.2	57.0
Blended price ($/1M)	$0.30	$0.36
Time to first token (s)	0.38s	1.34s
aime	2.3%	71.7%
aime 25	—	71.7%
artificial analysis coding index	—	22.10
artificial analysis intelligence index	10.60	25.00
artificial analysis math index	—	71.70
gpqa	40.1%	75.3%
hle	4.1%	10.6%
ifbench	—	46.1%
lcr	—	31.2%
livecodebench	18.8%	52.4%
math 500	53.8%	98.0%
mmlu pro	57.1%	82.8%
scicode	23.1%	36.0%
tau2	—	33.3%
terminalbench hard	—	15.2%

Benchmark data from Artificial Analysis.