Model Comparison

prism-ml/bonsai-8b-ggufvsunsloth/qwen3-14b-gguf

Side-by-side comparison of prism-ml/bonsai-8b-gguf and unsloth/qwen3-14b-gguf: downloads, license, context length, tasks, and benchmarks.

prism-ml/bonsai-8b-gguf

prism-ml · text-generation

End-to-end 1-bit language model for llama.cpp (CUDA, Metal, CPU) > **14.1x** smaller than FP16 | **6.2x** faster on RTX 4090 | **4-5x** lower energy/token

If you are using llama.cpp, Ollama, Open WebUI etc., you can add /think and /no_think to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations. Here is an example of mu…

Side-by-side Specifications

	prism-ml/bonsai-8b-gguf	unsloth/qwen3-14b-gguf
Author	prism-ml	unsloth
Pipeline Task	text-generation	text-generation
Library	llama.cpp	transformers
Downloads	83,309	39,267
Likes	618	123
License	Unknown	Unknown
Context Length	—	—
Created	2026-03-18	2025-04-28
Last Modified	2026-04-16	2025-06-08
Tags	llama.cppgguf1-bitllama-cppcudametalon-deviceprismmlbonsaitext-generation	transformersggufqwen3text-generationqwenunslothenarxiv:2309.00071base_model:Qwen/Qwen3-14Bbase_model:quantized:Qwen/Qwen3-14B

View full details: prism-ml/bonsai-8b-gguf · unsloth/qwen3-14b-gguf