Model Intelligence Sheet

shahzebkhoso/qwen3guard-gen-8b-gguf overview

This repository provides GGUF quantized versions of Qwen3Guard-Gen-8B, converted with llama.cpp. The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality. ---

llama.cppggufquantizationqwenenbase_model:Qwen/Qwen3Guard-Gen-8Bbase_model:quantized:Qwen/Qwen3Guard-Gen-8Blicense:apache-2.0endpoints_compatibleregion:usconversational

shahzebkhoso/qwen3guard-gen-8b-gguf visual

Downloads

105

Likes

Pipeline

—

Library

llama.cpp

Visibility

Public

Access

Open

Repository Files & Downloads

8 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3Guard-Gen-8B-FP16.gguf	GGUF	—	15.26 GB	Download
Qwen3Guard-Gen-8B-Q2_K.gguf	GGUF	Q2_K	3.06 GB	Download
Qwen3Guard-Gen-8B-Q3_K_M.gguf	GGUF	Q3_K_M	3.84 GB	Download
Qwen3Guard-Gen-8B-Q4_0.gguf	GGUF	—	4.45 GB	Download
Qwen3Guard-Gen-8B-Q4_K_M.gguf	GGUF	Q4_K_M	4.68 GB	Download
Qwen3Guard-Gen-8B-Q5_K_M.gguf	GGUF	Q5_K_M	5.45 GB	Download
Qwen3Guard-Gen-8B-Q6_K.gguf	GGUF	Q6_K	6.26 GB	Download
Qwen3Guard-Gen-8B-Q8_0.gguf	GGUF	—	8.11 GB	Download

Model Details Live

Model Slug

shahzebkhoso/qwen3guard-gen-8b-gguf

Author

ShahzebKhoso

Pipeline Task

—

Library

llama.cpp

Created

2025-09-24

Last Modified

2025-09-24

Gated

Private

HF SHA

65a68f6e8cc0f285604cf8e8ad4be22221a77ccb

License

apache-2.0

Language

Base Model

Qwen/Qwen3Guard-Gen-8B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "language": "en",
    "tags": [
      "gguf",
      "quantization",
      "llama.cpp",
      "qwen"
    ],
    "license": "apache-2.0",
    "base_model": "Qwen/Qwen3Guard-Gen-8B",
    "library_name": "llama.cpp",
    "frontmatter": {
      "language": "en",
      "tags": [
        "gguf",
        "quantization",
        "llama.cpp",
        "qwen"
      ],
      "license": "apache-2.0",
      "base_model": "Qwen/Qwen3Guard-Gen-8B",
      "library_name": "llama.cpp"
    },
    "hero_image_url": "",
    "summary": "This repository provides **GGUF quantized versions** of Qwen3Guard-Gen-8B, converted with llama.cpp. The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between **model size, inference speed, and output quality**. ---",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlanguage: en\ntags:\n- gguf\n- quantization\n- llama.cpp\n- qwen\nlicense: apache-2.0\nbase_model: Qwen/Qwen3Guard-Gen-8B\nlibrary_name: llama.cpp\n---\n\n# Qwen3Guard-Gen-8B - GGUF Quantized Versions\n\nThis repository provides **GGUF quantized versions** of [Qwen3Guard-Gen-8B](https://huggingface.co/Qwen/Qwen3Guard-Gen-8B), converted with [llama.cpp](https://github.com/ggerganov/llama.cpp).  \n\nThe base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between **model size, inference speed, and output quality**.\n\n---\n\n## 🔧 Model Details\n- **Base model:** [Qwen/Qwen3Guard-Gen-8B](https://huggingface.co/Qwen/Qwen3Guard-Gen-8B)  \n- **Architecture:** Qwen 3 (8B parameters)  \n- **Format:** GGUF  \n- **Intended use:** Guardrail / safety-aligned text generation  \n- **Conversion tool:** `convert_hf_to_gguf.py` (from llama.cpp)  \n- **Quantization tool:** `llama-quantize`  \n\n---\n\n## 📊 Quantized Versions\n\n| Quantization | Filename | Size (MiB) | Notes |\n|--------------|----------|------------|-------|\n| **FP16**     | `Qwen3Guard-Gen-8B-FP16.gguf` | ~15623 | Full precision (baseline) |\n| **Q2_K**     | `Qwen3Guard-Gen-8B-Q2_K.gguf` | ~3204 | Smallest, lowest accuracy |\n| **Q3_K_M**   | `Qwen3Guard-Gen-8B-Q3_K_M.gguf` | ~4027 | Balanced small size |\n| **Q4_0**     | `Qwen3Guard-Gen-8B-Q4_0.gguf` | ~4662 | Good balance, faster |\n| **Q4_K_M**   | `Qwen3Guard-Gen-8B-Q4_K_M.gguf` | ~4909 | Standard, widely used |\n| **Q5_K_M**   | `Qwen3Guard-Gen-8B-Q5_K_M.gguf` | ~5713 | Better accuracy |\n| **Q6_K**     | `Qwen3Guard-Gen-8B-Q6_K.gguf` | ~6568 | High accuracy |\n| **Q8_0**     | `Qwen3Guard-Gen-8B-Q8_0.gguf` | ~8505 | Near FP16 quality |\n\n---\n\n## 🚀 Usage\n\n### 🖥️ llama.cpp\nDownload a quantized file and run:\n```bash\n./main -m Qwen3Guard-Gen-8B-Q4_K_M.gguf -p \"Hello, Qwen!\"\n```\n\n### 🐍 Python\nDirectly download from hub, and use with llama-cpp-python.\n```python\nfrom huggingface_hub import hf_hub_download\nfrom llama_cpp import Llama\n\nmodel_path = hf_hub_download(\n    repo_id=\"ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF\",\n    filename=\"Qwen3Guard-Gen-8B-Q4_K_M.gguf\"\n)\n\nllm = Llama(model_path=model_path)\n\noutput = llm.create_chat_completion(\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": \"Hello, Qwen!\"}\n    ],\n    max_tokens=100\n)\n\nprint(output[\"choices\"][0][\"message\"][\"content\"])\n```\n\n\nThese GGUF versions are optimized for **fast inference** with CPU/GPU runtimes like `llama.cpp`, `Ollama`, and `LM Studio`.\n",
    "related_quantizations": []
  },
  "tags": [
    "llama.cpp",
    "gguf",
    "quantization",
    "qwen",
    "en",
    "base_model:Qwen/Qwen3Guard-Gen-8B",
    "base_model:quantized:Qwen/Qwen3Guard-Gen-8B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 1,
  "downloads": 105,
  "gated": false,
  "private": false,
  "last_modified": "2025-09-24T09:51:43.000Z",
  "created_at": "2025-09-24T06:21:07.000Z",
  "pipeline_tag": "",
  "library_name": "llama.cpp"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "68d38dd3a199e6d8007edd59",
  "id": "ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
  "modelId": "ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
  "sha": "65a68f6e8cc0f285604cf8e8ad4be22221a77ccb",
  "createdAt": "2025-09-24T06:21:07.000Z",
  "lastModified": "2025-09-24T09:51:43.000Z",
  "author": "ShahzebKhoso",
  "downloads": 105,
  "likes": 1,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "llama.cpp",
  "siblings_count": 10
}