Magistral Small 1.2 vs Hermes 4 - Llama-3.1 70B (Reasoning)

Mistral vs Nous Research — side-by-side benchmark comparison

	Magistral Small 1.2	Hermes 4 - Llama-3.1 70B (Reasoning)
Intelligence Index	18.2	16.0
Coding Index	14.8	14.4
Math Index	80.3	68.7
Output speed (tok/s)	111.1	92.8
Blended price ($/1M)	$0.75	$0.20
Time to first token (s)	0.38s	0.64s
aime	—	—
aime 25	80.3%	68.7%
artificial analysis coding index	14.80	14.40
artificial analysis intelligence index	18.20	16.00
artificial analysis math index	80.30	68.70
gpqa	66.3%	69.9%
hle	6.1%	7.9%
ifbench	44.4%	31.3%
lcr	16.3%	6.7%
livecodebench	72.3%	65.3%
math 500	—	—
mmlu pro	76.8%	81.1%
scicode	35.2%	34.1%
tau2	27.8%	22.5%
terminalbench hard	4.5%	4.5%

Benchmark data from Artificial Analysis.