Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) vs DeepSeek R1 Distill Qwen 32B

NVIDIA vs DeepSeek — side-by-side benchmark comparison

	Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	DeepSeek R1 Distill Qwen 32B
Intelligence Index	15.0	17.2
Coding Index	13.1	—
Math Index	63.7	63.0
Output speed (tok/s)	52.3	0.0
Blended price ($/1M)	$0.90	$0.00
Time to first token (s)	0.76s	0.00s
aime	74.7%	68.7%
aime 25	63.7%	63.0%
artificial analysis coding index	13.10	—
artificial analysis intelligence index	15.00	17.20
artificial analysis math index	63.70	63.00
gpqa	72.8%	61.5%
hle	8.1%	5.5%
ifbench	38.2%	22.9%
lcr	7.3%	9.7%
livecodebench	64.1%	27.0%
math 500	95.2%	94.1%
mmlu pro	82.5%	73.9%
scicode	34.7%	37.6%
tau2	11.4%	—
terminalbench hard	2.3%	—

Benchmark data from Artificial Analysis.