← All comparisons

Claude 3.7 Sonnet (Reasoning) vs Qwen3 235B A22B 2507 (Reasoning)

Anthropic vs Alibaba — side-by-side benchmark comparison

Claude 3.7 Sonnet (Reasoning)Qwen3 235B A22B 2507 (Reasoning)
Intelligence Index34.729.5
Coding Index27.623.2
Math Index56.391.0
Output speed (tok/s)0.062.5
Blended price ($/1M)$0.00$0.84
Time to first token (s)0.00s1.21s
aime48.7%94.0%
aime 2556.3%91.0%
artificial analysis coding index27.6023.20
artificial analysis intelligence index34.7029.50
artificial analysis math index56.3091.00
gpqa77.2%79.0%
hle10.3%15.0%
ifbench48.3%51.2%
lcr60.7%67.0%
livecodebench47.3%78.8%
math 50094.7%98.4%
mmlu pro83.7%84.3%
scicode40.3%42.4%
tau254.7%53.2%
terminalbench hard21.2%13.6%

Benchmark data from Artificial Analysis.