Llama 3.1 Tulu3 405B vs Qwen3 VL 30B A3B Instruct

Allen Institute for AI vs Alibaba — side-by-side benchmark comparison

	Llama 3.1 Tulu3 405B	Qwen3 VL 30B A3B Instruct
Intelligence Index	14.1	16.0
Coding Index	—	14.3
Math Index	—	72.3
Output speed (tok/s)	0.0	123.5
Blended price ($/1M)	$0.00	$0.30
Time to first token (s)	0.00s	1.07s
aime	13.3%	—
aime 25	—	72.3%
artificial analysis coding index	—	14.30
artificial analysis intelligence index	14.10	16.00
artificial analysis math index	—	72.30
gpqa	51.6%	69.5%
hle	3.5%	6.4%
ifbench	—	33.1%
lcr	—	23.7%
livecodebench	29.1%	47.6%
math 500	77.8%	—
mmlu pro	71.6%	76.4%
scicode	30.2%	30.8%
tau2	—	19.0%
terminalbench hard	—	6.1%

Benchmark data from Artificial Analysis.