Qwen3.5 0.8B (Non-reasoning) vs Devstral Small (Jul '25)

Alibaba vs Mistral — side-by-side benchmark comparison

	Qwen3.5 0.8B (Non-reasoning)	Devstral Small (Jul '25)
Intelligence Index	9.9	15.2
Coding Index	1.0	12.1
Math Index	—	29.3
Output speed (tok/s)	96.3	183.4
Blended price ($/1M)	$0.02	$0.15
Time to first token (s)	0.26s	0.40s
aime	—	0.3%
aime 25	—	29.3%
artificial analysis coding index	100.0%	12.10
artificial analysis intelligence index	9.90	15.20
artificial analysis math index	—	29.30
gpqa	23.6%	41.4%
hle	4.9%	3.7%
ifbench	21.6%	34.6%
lcr	6.7%	17.0%
livecodebench	—	25.4%
math 500	—	63.5%
mmlu pro	—	62.2%
scicode	2.9%	24.3%
tau2	65.2%	28.4%
terminalbench hard	0.0%	6.1%

Benchmark data from Artificial Analysis.