Hermes 3 - Llama-3.1 70B vs Qwen3 VL 4B (Reasoning)

Nous Research vs Alibaba — side-by-side benchmark comparison

	Hermes 3 - Llama-3.1 70B	Qwen3 VL 4B (Reasoning)
Intelligence Index	10.6	13.7
Coding Index	—	6.7
Math Index	—	25.7
Output speed (tok/s)	33.2	0.0
Blended price ($/1M)	$0.30	$0.00
Time to first token (s)	0.38s	0.00s
aime	2.3%	—
aime 25	—	25.7%
artificial analysis coding index	—	6.70
artificial analysis intelligence index	10.60	13.70
artificial analysis math index	—	25.70
gpqa	40.1%	49.4%
hle	4.1%	4.4%
ifbench	—	36.6%
lcr	—	21.3%
livecodebench	18.8%	32.0%
math 500	53.8%	—
mmlu pro	57.1%	70.0%
scicode	23.1%	17.1%
tau2	—	15.5%
terminalbench hard	—	1.5%

Benchmark data from Artificial Analysis.