← All comparisons

Claude 3.7 Sonnet (Non-reasoning) vs Qwen3 4B 2507 Instruct

Anthropic vs Alibaba — side-by-side benchmark comparison

Claude 3.7 Sonnet (Non-reasoning)Qwen3 4B 2507 Instruct
Intelligence Index30.812.9
Coding Index26.79.0
Math Index21.052.3
Output speed (tok/s)0.00.0
Blended price ($/1M)$6.56$0.00
Time to first token (s)0.00s0.00s
aime22.3%
aime 2521.0%52.3%
artificial analysis coding index26.709.00
artificial analysis intelligence index30.8012.90
artificial analysis math index21.0052.30
gpqa65.6%51.7%
hle4.8%4.7%
ifbench44.0%33.5%
lcr48.3%7.3%
livecodebench39.4%37.7%
math 50085.0%
mmlu pro80.3%67.2%
scicode37.6%18.1%
tau250.0%26.6%
terminalbench hard21.2%4.5%

Benchmark data from Artificial Analysis.