Trinity Large Thinking vs Qwen3 VL 4B (Reasoning)

Arcee AI vs Alibaba — side-by-side benchmark comparison

	Trinity Large Thinking	Qwen3 VL 4B (Reasoning)
Intelligence Index	31.9	13.7
Coding Index	27.2	6.7
Math Index	—	25.7
Output speed (tok/s)	171.4	0.0
Blended price ($/1M)	$0.40	$0.00
Time to first token (s)	0.67s	0.00s
aime	—	—
aime 25	—	25.7%
artificial analysis coding index	27.20	6.70
artificial analysis intelligence index	31.90	13.70
artificial analysis math index	—	25.70
gpqa	75.2%	49.4%
hle	14.7%	4.4%
ifbench	56.3%	36.6%
lcr	33.0%	21.3%
livecodebench	—	32.0%
math 500	—	—
mmlu pro	—	70.0%
scicode	36.1%	17.1%
tau2	90.1%	15.5%
terminalbench hard	22.7%	1.5%

Benchmark data from Artificial Analysis.