daniloreddy/qwen3.5-4b_gguf Q4_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
daniloreddy/qwen3.5-4b_gguf overview
This repository provides GGUF quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.
Downloads
259
Likes
0
Pipeline
text-generation
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
7 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3.5-4B_Q4_K_M.gguf | GGUF | Q4_K_M | 2.52 GB | Download |
| Qwen3.5-4B_Q4_K_S.gguf | GGUF | Q4_K_S | 2.38 GB | Download |
| Qwen3.5-4B_Q5_K_M.gguf | GGUF | Q5_K_M | 2.90 GB | Download |
| Qwen3.5-4B_Q5_K_S.gguf | GGUF | Q5_K_S | 2.78 GB | Download |
| Qwen3.5-4B_Q8_0.gguf | GGUF | — | 4.17 GB | Download |
| Qwen3.5-4B_fp16.gguf | GGUF | — | 7.85 GB | Download |
| mmproj-model-f16.gguf | GGUF | F16 | 641.27 MB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"base_model": "Qwen/Qwen3.5-4B",
"tags": [
"llama.cpp",
"gguf",
"quantized",
"text-generation",
"lightweight",
"lmstudio",
"jan",
"cobalt",
"text-generation-webui"
],
"frontmatter": {
"license": "apache-2.0",
"base_model": "Qwen/Qwen3.5-4B",
"tags": [
"llama.cpp",
"gguf",
"quantized",
"text-generation",
"lightweight",
"lmstudio",
"jan",
"cobalt",
"text-generation-webui"
]
},
"hero_image_url": "",
"summary": "This repository provides **GGUF** quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: apache-2.0\nbase_model: Qwen/Qwen3.5-4B\ntags:\n- llama.cpp\n- gguf\n- quantized\n- text-generation\n- lightweight\n- lmstudio\n- jan\n- cobalt\n- text-generation-webui\n---\n\n# Qwen3.5-4B - GGUF High-Quality Quantizations\n\nThis repository provides **GGUF** quantized versions of the [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) model, optimized for local execution using `llama.cpp` and compatible ecosystems.\n\n## 📌 Version Notes\nAll quantizations were generated from the official **FP16** weights.\n- **Target:** Efficient execution on consumer hardware, mobile/edge devices, and systems with limited memory.\n- **Performance:** The output quality (reasoning, coherence, and accuracy) is strictly dependent on the base model's parameter scale (4B).\n\n## 📊 Quantization Table\n\n| File | Method | Bit | Description |\n| :--- | :--- | :--- | :--- |\n| **fp16.gguf** | FP16 | 16-bit | **Original Weights.** No quantization applied. Maximum fidelity. |\n| **Q8_0.gguf** | Q8_0 | 8-bit | **Near-lossless.** Practically identical to the original model with lower memory footprint. |\n| **Q5_K_M.gguf** | Q5_K_M | 5-bit | **High Precision.** Minimizes quantization error for critical tasks. |\n| **Q4_K_M.gguf** | Q4_K_M | 4-bit | **Recommended.** Best balance between speed and performance. |\n| **Q4_K_S.gguf** | Q4_K_S | 4-bit | **Fast/Small.** Optimized for maximum throughput and low RAM usage. |\n\n## 🛠️ Technical Details\n- **Quantization Date:** 2026-03-07\n- **Tool used:** `llama-quantize` (llama.cpp)\n- **Method:** K-Quantization (optimized for AVX2/AVX-512 and modern GPU architectures).\n\n## 🚀 How to Use\n# Start a local OpenAI-compatible server with a web UI:\n\n### llama.cpp (CLI) using model from HuggingFace\n```bash\n./llama-cli -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (CLI) using downloaded model\n```bash\n./llama-cli -m path/to/Qwen3.5-4B_Q4_K_M.gguf -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (SERVER) using model from HuggingFace\n```bash\n./llama-server -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M --port 8080 -c 4096\n```\n\n### llama.cpp (SERVER) using downloaded model\n```bash\n./llama-server -m /path/to/Qwen3.5-4B_Q4_K_M.gguf --port 8080 -c 4096\n```",
"related_quantizations": []
},
"tags": [
"gguf",
"llama.cpp",
"quantized",
"text-generation",
"lightweight",
"lmstudio",
"jan",
"cobalt",
"text-generation-webui",
"base_model:Qwen/Qwen3.5-4B",
"base_model:quantized:Qwen/Qwen3.5-4B",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 259,
"gated": false,
"private": false,
"last_modified": "2026-03-14T17:45:39.000Z",
"created_at": "2026-03-07T23:16:03.000Z",
"pipeline_tag": "text-generation",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69acb1b3e49df7e34954b40f",
"id": "daniloreddy/Qwen3.5-4B_GGUF",
"modelId": "daniloreddy/Qwen3.5-4B_GGUF",
"sha": "2ad48a1ed88ea20da2d5eef59d96ffee260cbcb0",
"createdAt": "2026-03-07T23:16:03.000Z",
"lastModified": "2026-03-14T17:45:39.000Z",
"author": "daniloreddy",
"downloads": 259,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "",
"siblings_count": 9
}