DeepSeek R1 Distill Llama 70B vs Qwen3 VL 235B A22B (Reasoning)

DeepSeek vs Alibaba — side-by-side benchmark comparison

	DeepSeek R1 Distill Llama 70B	Qwen3 VL 235B A22B (Reasoning)
Intelligence Index	16.0	27.6
Coding Index	11.4	20.9
Math Index	53.7	88.3
Output speed (tok/s)	46.8	35.6
Blended price ($/1M)	$0.79	$2.17
Time to first token (s)	0.33s	5.14s
aime	67.0%	—
aime 25	53.7%	88.3%
artificial analysis coding index	11.40	20.90
artificial analysis intelligence index	16.00	27.60
artificial analysis math index	53.70	88.30
gpqa	40.2%	77.2%
hle	6.1%	10.1%
ifbench	27.6%	56.5%
lcr	11.0%	58.7%
livecodebench	26.6%	64.6%
math 500	93.5%	—
mmlu pro	79.5%	83.6%
scicode	31.3%	39.9%
tau2	21.9%	54.1%
terminalbench hard	1.5%	11.4%

Benchmark data from Artificial Analysis.