Olmo 3.1 32B Instruct vs Qwen3 VL 32B (Reasoning)

Allen Institute for AI vs Alibaba — side-by-side benchmark comparison

	Olmo 3.1 32B Instruct	Qwen3 VL 32B (Reasoning)
Intelligence Index	12.2	24.7
Coding Index	5.6	14.5
Math Index	—	84.7
Output speed (tok/s)	0.0	96.3
Blended price ($/1M)	$0.00	$2.63
Time to first token (s)	0.00s	1.12s
aime	—	—
aime 25	—	84.7%
artificial analysis coding index	5.60	14.50
artificial analysis intelligence index	12.20	24.70
artificial analysis math index	—	84.70
gpqa	53.9%	73.3%
hle	4.9%	9.6%
ifbench	39.2%	59.4%
lcr	0.0%	55.3%
livecodebench	—	73.8%
math 500	—	—
mmlu pro	—	81.8%
scicode	16.7%	28.5%
tau2	21.3%	45.6%
terminalbench hard	0.0%	7.6%

Benchmark data from Artificial Analysis.