Claude 2.0 vs Qwen3 235B A22B (Reasoning)

Anthropic vs Alibaba — side-by-side benchmark comparison

	Claude 2.0	Qwen3 235B A22B (Reasoning)
Intelligence Index	9.1	19.8
Coding Index	12.9	17.4
Math Index	—	82.0
Output speed (tok/s)	0.0	58.3
Blended price ($/1M)	$0.00	$2.63
Time to first token (s)	0.00s	1.37s
aime	0.0%	84.0%
aime 25	—	82.0%
artificial analysis coding index	12.90	17.40
artificial analysis intelligence index	9.10	19.80
artificial analysis math index	—	82.00
gpqa	34.4%	70.0%
hle	—	11.7%
ifbench	—	38.7%
lcr	—	0.0%
livecodebench	17.1%	62.2%
math 500	—	93.0%
mmlu pro	48.6%	82.8%
scicode	19.4%	39.9%
tau2	—	24.0%
terminalbench hard	—	6.1%

Benchmark data from Artificial Analysis.