← All comparisons

Hermes 4 - Llama-3.1 405B (Non-reasoning) vs Grok 4.20 0309 (Non-reasoning)

Nous Research vs xAI — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 405B (Non-reasoning)Grok 4.20 0309 (Non-reasoning)
Intelligence Index17.629.7
Coding Index18.125.4
Math Index15.3
Output speed (tok/s)40.8202.6
Blended price ($/1M)$1.50$3.00
Time to first token (s)0.73s0.50s
aime
aime 2515.3%
artificial analysis coding index18.1025.40
artificial analysis intelligence index17.6029.70
artificial analysis math index15.30
gpqa53.6%78.5%
hle4.2%22.5%
ifbench34.8%47.8%
lcr20.0%18.0%
livecodebench54.6%
math 500
mmlu pro72.9%
scicode34.6%32.2%
tau226.6%69.6%
terminalbench hard9.8%22.0%

Benchmark data from Artificial Analysis.