Gemma 3 4B Instruct vs Claude 4.1 Opus (Reasoning)

Google vs Anthropic — side-by-side benchmark comparison

	Gemma 3 4B Instruct	Claude 4.1 Opus (Reasoning)
Intelligence Index	6.3	42.0
Coding Index	2.9	36.5
Math Index	12.7	80.3
Output speed (tok/s)	0.0	44.5
Blended price ($/1M)	$0.05	$32.81
Time to first token (s)	0.00s	8.55s
aime	6.3%	—
aime 25	12.7%	80.3%
artificial analysis coding index	2.90	36.50
artificial analysis intelligence index	6.30	42.00
artificial analysis math index	12.70	80.30
gpqa	29.1%	80.9%
hle	5.2%	11.9%
ifbench	28.3%	55.4%
lcr	5.7%	66.3%
livecodebench	11.2%	65.4%
math 500	76.6%	—
mmlu pro	41.7%	88.0%
scicode	7.3%	40.9%
tau2	5.0%	71.4%
terminalbench hard	0.8%	34.3%

Benchmark data from Artificial Analysis.