Hermes 4 - Llama-3.1 405B (Reasoning) vs Magistral Medium 1

Nous Research vs Mistral — side-by-side benchmark comparison

	Hermes 4 - Llama-3.1 405B (Reasoning)	Magistral Medium 1
Intelligence Index	18.6	18.8
Coding Index	16.0	16.0
Math Index	69.7	40.3
Output speed (tok/s)	38.6	0.0
Blended price ($/1M)	$1.50	$0.00
Time to first token (s)	0.79s	0.00s
aime	—	70.0%
aime 25	69.7%	40.3%
artificial analysis coding index	16.00	16.00
artificial analysis intelligence index	18.60	18.80
artificial analysis math index	69.70	40.30
gpqa	72.7%	67.9%
hle	10.3%	9.5%
ifbench	32.7%	25.1%
lcr	20.7%	0.0%
livecodebench	68.6%	52.7%
math 500	—	91.7%
mmlu pro	82.9%	75.3%
scicode	25.2%	29.7%
tau2	22.2%	23.1%
terminalbench hard	11.4%	9.1%

Benchmark data from Artificial Analysis.