GPT-5.4 mini (xhigh) vs Qwen3 VL 235B A22B (Reasoning)

OpenAI vs Alibaba — side-by-side benchmark comparison

	GPT-5.4 mini (xhigh)	Qwen3 VL 235B A22B (Reasoning)
Intelligence Index	48.9	27.6
Coding Index	51.5	20.9
Math Index	—	88.3
Output speed (tok/s)	182.8	35.6
Blended price ($/1M)	$1.69	$2.17
Time to first token (s)	4.25s	5.14s
aime	—	—
aime 25	—	88.3%
artificial analysis coding index	51.50	20.90
artificial analysis intelligence index	48.90	27.60
artificial analysis math index	—	88.30
gpqa	87.5%	77.2%
hle	26.6%	10.1%
ifbench	73.3%	56.5%
lcr	69.3%	58.7%
livecodebench	—	64.6%
math 500	—	—
mmlu pro	—	83.6%
scicode	49.9%	39.9%
tau2	83.3%	54.1%
terminalbench hard	52.3%	11.4%

Benchmark data from Artificial Analysis.