Hermes 4 - Llama-3.1 405B (Non-reasoning) vs QwQ 32B

Nous Research vs Alibaba — side-by-side benchmark comparison

	Hermes 4 - Llama-3.1 405B (Non-reasoning)	QwQ 32B
Intelligence Index	17.6	19.7
Coding Index	18.1	—
Math Index	15.3	29.0
Output speed (tok/s)	40.8	31.3
Blended price ($/1M)	$1.50	$0.74
Time to first token (s)	0.73s	0.45s
aime	—	78.0%
aime 25	15.3%	29.0%
artificial analysis coding index	18.10	—
artificial analysis intelligence index	17.60	19.70
artificial analysis math index	15.30	29.00
gpqa	53.6%	59.3%
hle	4.2%	8.2%
ifbench	34.8%	38.8%
lcr	20.0%	25.0%
livecodebench	54.6%	63.1%
math 500	—	95.7%
mmlu pro	72.9%	76.4%
scicode	34.6%	35.8%
tau2	26.6%	—
terminalbench hard	9.8%	—

Benchmark data from Artificial Analysis.