← All comparisons

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) vs Hermes 4 - Llama-3.1 70B (Non-reasoning)

NVIDIA vs Nous Research — side-by-side benchmark comparison

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)Hermes 4 - Llama-3.1 70B (Non-reasoning)
Intelligence Index15.012.6
Coding Index13.19.2
Math Index63.711.3
Output speed (tok/s)52.394.3
Blended price ($/1M)$0.90$0.20
Time to first token (s)0.76s0.61s
aime74.7%
aime 2563.7%11.3%
artificial analysis coding index13.109.20
artificial analysis intelligence index15.0012.60
artificial analysis math index63.7011.30
gpqa72.8%49.1%
hle8.1%3.6%
ifbench38.2%29.0%
lcr7.3%2.0%
livecodebench64.1%26.9%
math 50095.2%
mmlu pro82.5%66.4%
scicode34.7%27.7%
tau211.4%21.6%
terminalbench hard2.3%0.0%

Benchmark data from Artificial Analysis.