← All comparisons

Gemma 4 31B (Non-reasoning) vs Hermes 4 - Llama-3.1 405B (Reasoning)

Google vs Nous Research — side-by-side benchmark comparison

Gemma 4 31B (Non-reasoning)Hermes 4 - Llama-3.1 405B (Reasoning)
Intelligence Index32.318.6
Coding Index33.916.0
Math Index69.7
Output speed (tok/s)18.038.6
Blended price ($/1M)$0.20$1.50
Time to first token (s)0.60s0.79s
aime
aime 2569.7%
artificial analysis coding index33.9016.00
artificial analysis intelligence index32.3018.60
artificial analysis math index69.70
gpqa76.3%72.7%
hle11.5%10.3%
ifbench53.5%32.7%
lcr36.0%20.7%
livecodebench68.6%
math 500
mmlu pro82.9%
scicode41.1%25.2%
tau265.5%22.2%
terminalbench hard30.3%11.4%

Benchmark data from Artificial Analysis.