GPT-5.1 Codex (high) vs Gemma 3 4B Instruct

OpenAI vs Google — side-by-side benchmark comparison

	GPT-5.1 Codex (high)	Gemma 3 4B Instruct
Intelligence Index	43.1	6.3
Coding Index	36.6	2.9
Math Index	95.7	12.7
Output speed (tok/s)	182.1	0.0
Blended price ($/1M)	$3.44	$0.05
Time to first token (s)	5.42s	0.00s
aime	—	6.3%
aime 25	95.7%	12.7%
artificial analysis coding index	36.60	2.90
artificial analysis intelligence index	43.10	6.30
artificial analysis math index	95.70	12.70
gpqa	86.0%	29.1%
hle	23.4%	5.2%
ifbench	70.0%	28.3%
lcr	67.3%	5.7%
livecodebench	84.9%	11.2%
math 500	—	76.6%
mmlu pro	86.0%	41.7%
scicode	40.2%	7.3%
tau2	83.0%	5.0%
terminalbench hard	34.8%	0.8%

Benchmark data from Artificial Analysis.