Model Intelligence Sheet
shahzebkhoso/qwen3guard-gen-8b-gguf overview
This repository provides GGUF quantized versions of Qwen3Guard-Gen-8B, converted with llama.cpp. The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality. ---
Downloads
105
Likes
1
Pipeline
—
Library
llama.cpp
Visibility
Public
Access
Open
Repository Files & Downloads
8 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3Guard-Gen-8B-FP16.gguf | GGUF | — | 15.26 GB | Download |
| Qwen3Guard-Gen-8B-Q2_K.gguf | GGUF | Q2_K | 3.06 GB | Download |
| Qwen3Guard-Gen-8B-Q3_K_M.gguf | GGUF | Q3_K_M | 3.84 GB | Download |
| Qwen3Guard-Gen-8B-Q4_0.gguf | GGUF | — | 4.45 GB | Download |
| Qwen3Guard-Gen-8B-Q4_K_M.gguf | GGUF | Q4_K_M | 4.68 GB | Download |
| Qwen3Guard-Gen-8B-Q5_K_M.gguf | GGUF | Q5_K_M | 5.45 GB | Download |
| Qwen3Guard-Gen-8B-Q6_K.gguf | GGUF | Q6_K | 6.26 GB | Download |
| Qwen3Guard-Gen-8B-Q8_0.gguf | GGUF | — | 8.11 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"language": "en",
"tags": [
"gguf",
"quantization",
"llama.cpp",
"qwen"
],
"license": "apache-2.0",
"base_model": "Qwen/Qwen3Guard-Gen-8B",
"library_name": "llama.cpp",
"frontmatter": {
"language": "en",
"tags": [
"gguf",
"quantization",
"llama.cpp",
"qwen"
],
"license": "apache-2.0",
"base_model": "Qwen/Qwen3Guard-Gen-8B",
"library_name": "llama.cpp"
},
"hero_image_url": "",
"summary": "This repository provides **GGUF quantized versions** of Qwen3Guard-Gen-8B, converted with llama.cpp. The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between **model size, inference speed, and output quality**. ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlanguage: en\ntags:\n- gguf\n- quantization\n- llama.cpp\n- qwen\nlicense: apache-2.0\nbase_model: Qwen/Qwen3Guard-Gen-8B\nlibrary_name: llama.cpp\n---\n\n# Qwen3Guard-Gen-8B - GGUF Quantized Versions\n\nThis repository provides **GGUF quantized versions** of [Qwen3Guard-Gen-8B](https://huggingface.co/Qwen/Qwen3Guard-Gen-8B), converted with [llama.cpp](https://github.com/ggerganov/llama.cpp). \n\nThe base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between **model size, inference speed, and output quality**.\n\n---\n\n## 🔧 Model Details\n- **Base model:** [Qwen/Qwen3Guard-Gen-8B](https://huggingface.co/Qwen/Qwen3Guard-Gen-8B) \n- **Architecture:** Qwen 3 (8B parameters) \n- **Format:** GGUF \n- **Intended use:** Guardrail / safety-aligned text generation \n- **Conversion tool:** `convert_hf_to_gguf.py` (from llama.cpp) \n- **Quantization tool:** `llama-quantize` \n\n---\n\n## 📊 Quantized Versions\n\n| Quantization | Filename | Size (MiB) | Notes |\n|--------------|----------|------------|-------|\n| **FP16** | `Qwen3Guard-Gen-8B-FP16.gguf` | ~15623 | Full precision (baseline) |\n| **Q2_K** | `Qwen3Guard-Gen-8B-Q2_K.gguf` | ~3204 | Smallest, lowest accuracy |\n| **Q3_K_M** | `Qwen3Guard-Gen-8B-Q3_K_M.gguf` | ~4027 | Balanced small size |\n| **Q4_0** | `Qwen3Guard-Gen-8B-Q4_0.gguf` | ~4662 | Good balance, faster |\n| **Q4_K_M** | `Qwen3Guard-Gen-8B-Q4_K_M.gguf` | ~4909 | Standard, widely used |\n| **Q5_K_M** | `Qwen3Guard-Gen-8B-Q5_K_M.gguf` | ~5713 | Better accuracy |\n| **Q6_K** | `Qwen3Guard-Gen-8B-Q6_K.gguf` | ~6568 | High accuracy |\n| **Q8_0** | `Qwen3Guard-Gen-8B-Q8_0.gguf` | ~8505 | Near FP16 quality |\n\n---\n\n## 🚀 Usage\n\n### 🖥️ llama.cpp\nDownload a quantized file and run:\n```bash\n./main -m Qwen3Guard-Gen-8B-Q4_K_M.gguf -p \"Hello, Qwen!\"\n```\n\n### 🐍 Python\nDirectly download from hub, and use with llama-cpp-python.\n```python\nfrom huggingface_hub import hf_hub_download\nfrom llama_cpp import Llama\n\nmodel_path = hf_hub_download(\n repo_id=\"ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF\",\n filename=\"Qwen3Guard-Gen-8B-Q4_K_M.gguf\"\n)\n\nllm = Llama(model_path=model_path)\n\noutput = llm.create_chat_completion(\n messages=[\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": \"Hello, Qwen!\"}\n ],\n max_tokens=100\n)\n\nprint(output[\"choices\"][0][\"message\"][\"content\"])\n```\n\n\nThese GGUF versions are optimized for **fast inference** with CPU/GPU runtimes like `llama.cpp`, `Ollama`, and `LM Studio`.\n",
"related_quantizations": []
},
"tags": [
"llama.cpp",
"gguf",
"quantization",
"qwen",
"en",
"base_model:Qwen/Qwen3Guard-Gen-8B",
"base_model:quantized:Qwen/Qwen3Guard-Gen-8B",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 1,
"downloads": 105,
"gated": false,
"private": false,
"last_modified": "2025-09-24T09:51:43.000Z",
"created_at": "2025-09-24T06:21:07.000Z",
"pipeline_tag": "",
"library_name": "llama.cpp"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "68d38dd3a199e6d8007edd59",
"id": "ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
"modelId": "ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
"sha": "65a68f6e8cc0f285604cf8e8ad4be22221a77ccb",
"createdAt": "2025-09-24T06:21:07.000Z",
"lastModified": "2025-09-24T09:51:43.000Z",
"author": "ShahzebKhoso",
"downloads": 105,
"likes": 1,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "llama.cpp",
"siblings_count": 10
}