Claude 3.7 Sonnet (Non-reasoning) vs Qwen3 4B 2507 Instruct

Anthropic vs Alibaba — side-by-side benchmark comparison

	Claude 3.7 Sonnet (Non-reasoning)	Qwen3 4B 2507 Instruct
Intelligence Index	30.8	12.9
Coding Index	26.7	9.0
Math Index	21.0	52.3
Output speed (tok/s)	0.0	0.0
Blended price ($/1M)	$6.56	$0.00
Time to first token (s)	0.00s	0.00s
aime	22.3%	—
aime 25	21.0%	52.3%
artificial analysis coding index	26.70	9.00
artificial analysis intelligence index	30.80	12.90
artificial analysis math index	21.00	52.30
gpqa	65.6%	51.7%
hle	4.8%	4.7%
ifbench	44.0%	33.5%
lcr	48.3%	7.3%
livecodebench	39.4%	37.7%
math 500	85.0%	—
mmlu pro	80.3%	67.2%
scicode	37.6%	18.1%
tau2	50.0%	26.6%
terminalbench hard	21.2%	4.5%

Benchmark data from Artificial Analysis.