Qwen3.5 4B (Non-reasoning) vs DeepSeek V3.1 Terminus (Reasoning)

Alibaba vs DeepSeek — side-by-side benchmark comparison

	Qwen3.5 4B (Non-reasoning)	DeepSeek V3.1 Terminus (Reasoning)
Intelligence Index	22.6	33.9
Coding Index	13.7	33.7
Math Index	—	89.7
Output speed (tok/s)	210.0	0.0
Blended price ($/1M)	$0.06	$1.91
Time to first token (s)	0.23s	0.00s
aime	—	—
aime 25	—	89.7%
artificial analysis coding index	13.70	33.70
artificial analysis intelligence index	22.60	33.90
artificial analysis math index	—	89.70
gpqa	71.2%	79.2%
hle	7.5%	15.2%
ifbench	33.3%	57.0%
lcr	28.3%	65.0%
livecodebench	—	79.8%
math 500	—	—
mmlu pro	—	85.1%
scicode	18.3%	40.6%
tau2	87.7%	37.1%
terminalbench hard	11.4%	30.3%

Benchmark data from Artificial Analysis.