Model Comparison

prism-ml/bonsai-8b-ggufvstetf/gemma-3-1b-it-qat-q4_0-gguf

Side-by-side comparison of prism-ml/bonsai-8b-gguf and tetf/gemma-3-1b-it-qat-q4_0-gguf: downloads, license, context length, tasks, and benchmarks.

prism-ml/bonsai-8b-gguf

prism-ml · text-generation

End-to-end 1-bit language model for llama.cpp (CUDA, Metal, CPU) > **14.1x** smaller than FP16 | **6.2x** faster on RTX 4090 | **4-5x** lower energy/token

tetf/gemma-3-1b-it-qat-q4_0-gguf

tetf · text-generation

**Model Page**: Gemma > [!Note] > This repository corresponds to the 1B **instruction-tuned** version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). > The GGUF corresponds to Q4_0 quantization. > > Thanks to QAT, the model is able to preserve simila…

Side-by-side Specifications

	prism-ml/bonsai-8b-gguf	tetf/gemma-3-1b-it-qat-q4_0-gguf
Author	prism-ml	tetf
Pipeline Task	text-generation	text-generation
Library	llama.cpp	—
Downloads	83,309	28,500
Likes	618	0
License	Unknown	Unknown
Context Length	—	—
Created	2026-03-18	2025-04-11
Last Modified	2026-04-16	2025-04-11
Tags	llama.cppgguf1-bitllama-cppcudametalon-deviceprismmlbonsaitext-generation	ggufgemmagemma3text-generationarxiv:1905.07830arxiv:1905.10044arxiv:1911.11641arxiv:1904.09728arxiv:1705.03551arxiv:1911.01547

View full details: prism-ml/bonsai-8b-gguf · tetf/gemma-3-1b-it-qat-q4_0-gguf