Qwen3.5 9B (Non-reasoning) vs Claude 4.1 Opus (Reasoning)

Alibaba vs Anthropic — side-by-side benchmark comparison

	Qwen3.5 9B (Non-reasoning)	Claude 4.1 Opus (Reasoning)
Intelligence Index	27.3	42.0
Coding Index	21.3	36.5
Math Index	—	80.3
Output speed (tok/s)	0.0	44.5
Blended price ($/1M)	$0.00	$32.81
Time to first token (s)	0.00s	8.55s
aime	—	—
aime 25	—	80.3%
artificial analysis coding index	21.30	36.50
artificial analysis intelligence index	27.30	42.00
artificial analysis math index	—	80.30
gpqa	78.6%	80.9%
hle	8.6%	11.9%
ifbench	37.8%	55.4%
lcr	38.0%	66.3%
livecodebench	—	65.4%
math 500	—	—
mmlu pro	—	88.0%
scicode	27.7%	40.9%
tau2	85.1%	71.4%
terminalbench hard	18.2%	34.3%

Benchmark data from Artificial Analysis.