What is Dhptl/Qwen3-8B-GGUF?

--- license: apache-2.0 base_model: Qwen/Qwen3-8B pipeline_tag: text-generation tags: - license:apache-2.0 - arxiv:2309.00071 - base_model:Qwen/Qwen3-8B-Base - base_model:finetune:Qwen/Qwen3-8B-Base - region:us - transformers - deploy:azure - qwen3 - quantized - text-generation-inference - gguf - safetensors - arxiv:2505.09388 - text-generation - conversational language: - en --- # Qwen3-8B — GGUF Quantizations [![Model on HF](https://img.shields.io/badge/🤗-Model_on_HuggingFace-yellow)](https://huggingface.co/Dhptl/Qwen3-8B-GGUF) [![Original Model](https://img.shields.io/badge/Original-Qwen_Qwen3-8B-blue)](https://huggingface.co/Qwen/Qwen3-8B) [![quant-kit](https://img.shields.io/badge/Made_with-quant--kit-green)](https://github.com/DhruvalPtl/quant-kit) **Quantized GGUF versions of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)** Works with **[llama.cpp](ht…

What license applies to Dhptl/Qwen3-8B-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

How do I run Dhptl/Qwen3-8B-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Model Intelligence Sheet

Dhptl/Qwen3-8B-GGUF overview

license: apache 2.0 base model: Qwen/Qwen3 8B pipeline tag: text generation tags: license:apache 2.0 arxiv:2309.00071 base model:Qwen/Qwen3 8B Base base model:…

transformersgguflicense:apache-2.0arxiv:2309.00071base_model:Qwen/Qwen3-8B-Basebase_model:finetune:Qwen/Qwen3-8B-Baseregion:usdeploy:azureqwen3quantizedtext-generation-inferencesafetensorsarxiv:2505.09388text-generationconversationalenbase_model:Qwen/Qwen3-8Bbase_model:quantized:Qwen/Qwen3-8Bendpoints_compatible

Runs locally from ~3.06 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

text-generation

Author

Dhptl

Repository Files & Downloads

10 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
Qwen3-8B-Q2_K.gguf	GGUF	Q2_K	3.06 GB	Download
Qwen3-8B-Q3_K_L.gguf	GGUF	Q3_K_L	4.13 GB	Download
Qwen3-8B-Q3_K_M.gguf	GGUF	Q3_K_M	3.84 GB	Download
Qwen3-8B-Q3_K_S.gguf	GGUF	Q3_K_S	3.51 GB	Download
Qwen3-8B-Q4_K_M.gguf	GGUF	Q4_K_M	4.68 GB	Download
Qwen3-8B-Q4_K_S.gguf	GGUF	Q4_K_S	4.47 GB	Download
Qwen3-8B-Q5_K_M.gguf	GGUF	Q5_K_M	5.45 GB	Download
Qwen3-8B-Q5_K_S.gguf	GGUF	Q5_K_S	5.33 GB	Download
Qwen3-8B-Q6_K.gguf	GGUF	Q6_K	6.26 GB	Download
Qwen3-8B-Q8_0.gguf	GGUF	Q8_0	8.11 GB	Download

Model Details

Model ID	Dhptl/Qwen3-8B-GGUF
Author	Dhptl
Pipeline	text-generation
License	apache-2.0
Base model	Qwen/Qwen3-8B
Last modified	2026-06-11T11:53:00.000Z

Model README

---

license: apache-2.0

base_model: Qwen/Qwen3-8B

pipeline_tag: text-generation

tags:

- license:apache-2.0

- arxiv:2309.00071

- base_model:Qwen/Qwen3-8B-Base

- base_model:finetune:Qwen/Qwen3-8B-Base

- region:us

- transformers

- deploy:azure

- qwen3

- quantized

- text-generation-inference

- gguf

- safetensors

- arxiv:2505.09388

- text-generation

- conversational

language:

- en

---

Qwen3-8B — GGUF Quantizations

![Model on HF](https://huggingface.co/Dhptl/Qwen3-8B-GGUF)

![Original Model](https://huggingface.co/Qwen/Qwen3-8B)

![quant-kit](https://github.com/DhruvalPtl/quant-kit)

Quantized GGUF versions of Qwen/Qwen3-8B

Works with llama.cpp · Ollama · LM Studio · Open WebUI · Jan

Quantized by Dhptl on June 11, 2026 using quant-kit

</div>

---

⚖️ The Pareto Frontier — Efficiency vs Intelligence

> Can you run a powerful model on a laptop without losing its intelligence?

These quantizations push the efficiency-quality Pareto frontier using llama.cpp's

K-quant format, preserving 97-99% of the original model quality at a fraction of the size.

|---|---|---|---|

---

📦 Available Files

|---|---|---|---|---|---|

| Qwen3-8B-Q2_K.gguf | 3.06 GB | ~4.6 GB | Q2_K | ⭐ | Extreme compression, significant quality loss. |

| Qwen3-8B-Q3_K_L.gguf | 4.13 GB | ~5.6 GB | Q3_K_L | ⭐⭐⭐ | Slightly better than Q3_K_M, still a compromise. |

| Qwen3-8B-Q3_K_M.gguf | 3.84 GB | ~5.3 GB | Q3_K_M | ⭐⭐⭐ | Very small file. Quality drop noticeable. |

| Qwen3-8B-Q3_K_S.gguf | 3.51 GB | ~5.0 GB | Q3_K_S | ⭐⭐ | Very high compression, high quality loss. |

| Qwen3-8B-Q4_K_M.gguf | 4.68 GB | ~6.2 GB | Q4_K_M ✅ Recommended | ⭐⭐⭐⭐ | Best balance of size and quality. Recommended for most users. |

| Qwen3-8B-Q4_K_S.gguf | 4.47 GB | ~6.0 GB | Q4_K_S | ⭐⭐⭐½ | Good speed/size balance, slight quality loss. |

| Qwen3-8B-Q5_K_M.gguf | 5.45 GB | ~6.9 GB | Q5_K_M | ⭐⭐⭐⭐½ | Better quality than Q4, slightly larger. Great if you have the RAM. |

| Qwen3-8B-Q5_K_S.gguf | 5.33 GB | ~6.8 GB | Q5_K_S | ⭐⭐⭐⭐ | Large but accurate. |

| Qwen3-8B-Q6_K.gguf | 6.26 GB | ~7.8 GB | Q6_K | ⭐⭐⭐⭐⭐ | Near-perfect quality, very large. |

| Qwen3-8B-Q8_0.gguf | 8.11 GB | ~9.6 GB | Q8_0 | ⭐⭐⭐⭐⭐ | Closest to original quality. Use when RAM is not a concern. |

💡 Which file should I download?

Most users: Qwen3-8B-Q4_K_M.gguf — best balance of size and quality
High RAM (32GB+): Qwen3-8B-Q8_0.gguf — near-original quality
Low RAM (8GB): Qwen3-8B-Q3_K_M.gguf — fits in 8GB with room to spare

---

⚡ Speed Benchmarks

Run python benchmark.py --model Qwen3-8B to generate speed results.

---

🧠 Quality Benchmarks

Run kaggle_bench.ipynb on Kaggle to benchmark this model.

---

🚀 How to Use

Ollama

ollama run dhptl/qwen3-8b

LM Studio / Jan / Open WebUI

Search for Dhptl/Qwen3-8B in the model browser.

llama.cpp CLI

# Download the binary from https://github.com/ggerganov/llama.cpp/releases
./llama-cli \
  -m Qwen3-8B-Q4_K_M.gguf \
  -p "You are a helpful assistant." \
  --conversation \
  -n 512

Python — llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./Qwen3-8B-Q4_K_M.gguf",
    n_gpu_layers=-1,   # -1 = offload everything to GPU
    n_ctx=4096,
)

response = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Tell me about quantization."}
])
print(response["choices"][0]["message"]["content"])

---

🔍 About GGUF Quantization

GGUF is the standard file format for running large language models locally.

Quantization reduces the number of bits per weight:

|---|---|---|---|

| Q2_K | ~2.6 | 16% | ⭐ |

| Q3_K_M | ~3.3 | 21% | ⭐⭐⭐ |

| Q4_K_M | ~4.5 | 28% | ⭐⭐⭐⭐ ← sweet spot |

| Q5_K_M | ~5.6 | 35% | ⭐⭐⭐⭐½ |

| Q8_0 | ~8.5 | 53% | ⭐⭐⭐⭐⭐ |

---

💬 Community & Feedback

Found an issue? Have a question? Open a Discussion in the Community tab above.

If these quantizations were useful, please consider:

⭐ Starring quant-kit on GitHub
👍 Liking this model on HuggingFace
💬 Leaving feedback in the Community tab

Run Dhptl/Qwen3-8B-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models