← All comparisons

Hermes 4 - Llama-3.1 405B (Non-reasoning) vs Gemini 2.5 Flash (Reasoning)

Nous Research vs Google — side-by-side benchmark comparison

Hermes 4 - Llama-3.1 405B (Non-reasoning)Gemini 2.5 Flash (Reasoning)
Intelligence Index17.627.0
Coding Index18.122.2
Math Index15.373.3
Output speed (tok/s)40.8205.5
Blended price ($/1M)$1.50$0.85
Time to first token (s)0.73s10.67s
aime82.3%
aime 2515.3%73.3%
artificial analysis coding index18.1022.20
artificial analysis intelligence index17.6027.00
artificial analysis math index15.3073.30
gpqa53.6%79.0%
hle4.2%11.1%
ifbench34.8%50.3%
lcr20.0%61.7%
livecodebench54.6%69.5%
math 50098.1%
mmlu pro72.9%83.2%
scicode34.6%39.4%
tau226.6%31.6%
terminalbench hard9.8%13.6%

Benchmark data from Artificial Analysis.