Command-R (Mar '24) vs Qwen3 4B 2507 (Reasoning)

Cohere vs Alibaba — side-by-side benchmark comparison

	Command-R (Mar '24)	Qwen3 4B 2507 (Reasoning)
Intelligence Index	7.4	18.2
Coding Index	—	9.5
Math Index	—	82.7
Output speed (tok/s)	0.0	0.0
Blended price ($/1M)	$0.75	$0.00
Time to first token (s)	0.00s	0.00s
aime	0.7%	—
aime 25	—	82.7%
artificial analysis coding index	—	9.50
artificial analysis intelligence index	7.40	18.20
artificial analysis math index	—	82.70
gpqa	28.4%	66.7%
hle	4.8%	5.9%
ifbench	—	49.8%
lcr	—	37.7%
livecodebench	4.8%	64.1%
math 500	16.4%	—
mmlu pro	33.8%	74.3%
scicode	6.3%	25.6%
tau2	—	25.4%
terminalbench hard	—	1.5%

Benchmark data from Artificial Analysis.