← All comparisons

Llama 3.1 Tulu3 405B vs Qwen3 235B A22B 2507 (Reasoning)

Allen Institute for AI vs Alibaba — side-by-side benchmark comparison

Llama 3.1 Tulu3 405BQwen3 235B A22B 2507 (Reasoning)
Intelligence Index14.129.5
Coding Index23.2
Math Index91.0
Output speed (tok/s)0.062.5
Blended price ($/1M)$0.00$0.84
Time to first token (s)0.00s1.21s
aime13.3%94.0%
aime 2591.0%
artificial analysis coding index23.20
artificial analysis intelligence index14.1029.50
artificial analysis math index91.00
gpqa51.6%79.0%
hle3.5%15.0%
ifbench51.2%
lcr67.0%
livecodebench29.1%78.8%
math 50077.8%98.4%
mmlu pro71.6%84.3%
scicode30.2%42.4%
tau253.2%
terminalbench hard13.6%

Benchmark data from Artificial Analysis.