Phi-4 Multimodal Instruct vs DeepSeek R1 0528 Qwen3 8B

Microsoft vs DeepSeek — side-by-side benchmark comparison

	Phi-4 Multimodal Instruct	DeepSeek R1 0528 Qwen3 8B
Intelligence Index	10.0	16.4
Coding Index	—	7.8
Math Index	—	63.7
Output speed (tok/s)	16.6	0.0
Blended price ($/1M)	$0.00	$0.00
Time to first token (s)	1.33s	0.00s
aime	9.3%	65.0%
aime 25	—	63.7%
artificial analysis coding index	—	7.80
artificial analysis intelligence index	10.00	16.40
artificial analysis math index	—	63.70
gpqa	31.5%	61.2%
hle	4.4%	5.6%
ifbench	—	19.9%
lcr	—	13.0%
livecodebench	13.1%	51.3%
math 500	69.3%	93.2%
mmlu pro	48.5%	73.9%
scicode	11.0%	20.4%
tau2	—	0.0%
terminalbench hard	—	1.5%

Benchmark data from Artificial Analysis.