daniloreddy/qwen3.5-4b_gguf Q4_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

daniloreddy/qwen3.5-4b_gguf overview

This repository provides GGUF quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.

ggufllama.cppquantizedtext-generationlightweightlmstudiojancobalttext-generation-webuibase_model:Qwen/Qwen3.5-4Bbase_model:quantized:Qwen/Qwen3.5-4Blicense:apache-2.0endpoints_compatibleregion:usconversational

Downloads

259

Likes

Pipeline

text-generation

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

7 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3.5-4B_Q4_K_M.gguf	GGUF	Q4_K_M	2.52 GB	Download
Qwen3.5-4B_Q4_K_S.gguf	GGUF	Q4_K_S	2.38 GB	Download
Qwen3.5-4B_Q5_K_M.gguf	GGUF	Q5_K_M	2.90 GB	Download
Qwen3.5-4B_Q5_K_S.gguf	GGUF	Q5_K_S	2.78 GB	Download
Qwen3.5-4B_Q8_0.gguf	GGUF	—	4.17 GB	Download
Qwen3.5-4B_fp16.gguf	GGUF	—	7.85 GB	Download
mmproj-model-f16.gguf	GGUF	F16	641.27 MB	Download

Model Details Live

Model Slug

daniloreddy/qwen3.5-4b_gguf

Author

daniloreddy

Pipeline Task

text-generation

Library

—

Created

2026-03-07

Last Modified

2026-03-14

Gated

Private

HF SHA

2ad48a1ed88ea20da2d5eef59d96ffee260cbcb0

License

apache-2.0

Language

Unknown

Base Model

Qwen/Qwen3.5-4B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "base_model": "Qwen/Qwen3.5-4B",
    "tags": [
      "llama.cpp",
      "gguf",
      "quantized",
      "text-generation",
      "lightweight",
      "lmstudio",
      "jan",
      "cobalt",
      "text-generation-webui"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "base_model": "Qwen/Qwen3.5-4B",
      "tags": [
        "llama.cpp",
        "gguf",
        "quantized",
        "text-generation",
        "lightweight",
        "lmstudio",
        "jan",
        "cobalt",
        "text-generation-webui"
      ]
    },
    "hero_image_url": "",
    "summary": "This repository provides **GGUF** quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nbase_model: Qwen/Qwen3.5-4B\ntags:\n- llama.cpp\n- gguf\n- quantized\n- text-generation\n- lightweight\n- lmstudio\n- jan\n- cobalt\n- text-generation-webui\n---\n\n# Qwen3.5-4B - GGUF High-Quality Quantizations\n\nThis repository provides **GGUF** quantized versions of the [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) model, optimized for local execution using `llama.cpp` and compatible ecosystems.\n\n## 📌 Version Notes\nAll quantizations were generated from the official **FP16** weights.\n- **Target:** Efficient execution on consumer hardware, mobile/edge devices, and systems with limited memory.\n- **Performance:** The output quality (reasoning, coherence, and accuracy) is strictly dependent on the base model's parameter scale (4B).\n\n## 📊 Quantization Table\n\n| File | Method | Bit | Description |\n| :--- | :--- | :--- | :--- |\n| **fp16.gguf** | FP16 | 16-bit | **Original Weights.** No quantization applied. Maximum fidelity. |\n| **Q8_0.gguf** | Q8_0 | 8-bit | **Near-lossless.** Practically identical to the original model with lower memory footprint. |\n| **Q5_K_M.gguf** | Q5_K_M | 5-bit | **High Precision.** Minimizes quantization error for critical tasks. |\n| **Q4_K_M.gguf** | Q4_K_M | 4-bit | **Recommended.** Best balance between speed and performance. |\n| **Q4_K_S.gguf** | Q4_K_S | 4-bit | **Fast/Small.** Optimized for maximum throughput and low RAM usage. |\n\n## 🛠️ Technical Details\n- **Quantization Date:** 2026-03-07\n- **Tool used:** `llama-quantize` (llama.cpp)\n- **Method:** K-Quantization (optimized for AVX2/AVX-512 and modern GPU architectures).\n\n## 🚀 How to Use\n# Start a local OpenAI-compatible server with a web UI:\n\n### llama.cpp (CLI) using model from HuggingFace\n```bash\n./llama-cli -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (CLI) using downloaded model\n```bash\n./llama-cli -m path/to/Qwen3.5-4B_Q4_K_M.gguf -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (SERVER) using model from HuggingFace\n```bash\n./llama-server -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M --port 8080 -c 4096\n```\n\n### llama.cpp (SERVER) using downloaded model\n```bash\n./llama-server -m /path/to/Qwen3.5-4B_Q4_K_M.gguf --port 8080 -c 4096\n```",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "llama.cpp",
    "quantized",
    "text-generation",
    "lightweight",
    "lmstudio",
    "jan",
    "cobalt",
    "text-generation-webui",
    "base_model:Qwen/Qwen3.5-4B",
    "base_model:quantized:Qwen/Qwen3.5-4B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 259,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-14T17:45:39.000Z",
  "created_at": "2026-03-07T23:16:03.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69acb1b3e49df7e34954b40f",
  "id": "daniloreddy/Qwen3.5-4B_GGUF",
  "modelId": "daniloreddy/Qwen3.5-4B_GGUF",
  "sha": "2ad48a1ed88ea20da2d5eef59d96ffee260cbcb0",
  "createdAt": "2026-03-07T23:16:03.000Z",
  "lastModified": "2026-03-14T17:45:39.000Z",
  "author": "daniloreddy",
  "downloads": 259,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 9
}

daniloreddy/qwen3.5-4b_gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard