K2 Think V2 vs Claude 4.5 Sonnet (Reasoning)

MBZUAI Institute of Foundation Models vs Anthropic — side-by-side benchmark comparison

	K2 Think V2	Claude 4.5 Sonnet (Reasoning)
Intelligence Index	24.1	43.0
Coding Index	15.5	38.6
Math Index	—	88.0
Output speed (tok/s)	0.0	55.0
Blended price ($/1M)	$0.00	$6.56
Time to first token (s)	0.00s	7.02s
aime	—	—
aime 25	—	88.0%
artificial analysis coding index	15.50	38.60
artificial analysis intelligence index	24.10	43.00
artificial analysis math index	—	88.00
gpqa	71.3%	83.4%
hle	9.5%	17.3%
ifbench	62.8%	57.3%
lcr	52.7%	65.7%
livecodebench	—	71.4%
math 500	—	—
mmlu pro	—	87.5%
scicode	33.0%	44.7%
tau2	25.4%	78.1%
terminalbench hard	6.8%	35.6%

Benchmark data from Artificial Analysis.