DeepSeek V3.1 Terminus (Non-reasoning) vs Qwen2 Instruct 72B

DeepSeek vs Alibaba — side-by-side benchmark comparison

	DeepSeek V3.1 Terminus (Non-reasoning)	Qwen2 Instruct 72B
Intelligence Index	28.5	11.7
Coding Index	31.9	—
Math Index	53.7	—
Output speed (tok/s)	0.0	0.0
Blended price ($/1M)	$0.45	$0.00
Time to first token (s)	0.00s	0.00s
aime	—	14.7%
aime 25	53.7%	—
artificial analysis coding index	31.90	—
artificial analysis intelligence index	28.50	11.70
artificial analysis math index	53.70	—
gpqa	75.1%	37.1%
hle	8.4%	3.7%
ifbench	41.2%	—
lcr	43.3%	—
livecodebench	52.9%	15.9%
math 500	—	70.1%
mmlu pro	83.6%	62.2%
scicode	32.1%	22.9%
tau2	37.1%	—
terminalbench hard	31.8%	—

Benchmark data from Artificial Analysis.