GraySoft
Projects Models About FAQ Contact Download guIDE →

daniloreddy/qwen3.5-4b_gguf Q4_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

daniloreddy/qwen3.5-4b_gguf overview

This repository provides GGUF quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.

ggufllama.cppquantizedtext-generationlightweightlmstudiojancobalttext-generation-webuibase_model:Qwen/Qwen3.5-4Bbase_model:quantized:Qwen/Qwen3.5-4Blicense:apache-2.0endpoints_compatibleregion:usconversational
daniloreddy/qwen3.5-4b_gguf visual
Downloads
259
Likes
0
Pipeline
text-generation
Library
Visibility
Public
Access
Open

Repository Files & Downloads

7 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen3.5-4B_Q4_K_M.gguf GGUF Q4_K_M 2.52 GB Download
Qwen3.5-4B_Q4_K_S.gguf GGUF Q4_K_S 2.38 GB Download
Qwen3.5-4B_Q5_K_M.gguf GGUF Q5_K_M 2.90 GB Download
Qwen3.5-4B_Q5_K_S.gguf GGUF Q5_K_S 2.78 GB Download
Qwen3.5-4B_Q8_0.gguf GGUF 4.17 GB Download
Qwen3.5-4B_fp16.gguf GGUF 7.85 GB Download
mmproj-model-f16.gguf GGUF F16 641.27 MB Download

Model Details Live

Model Slug
daniloreddy/qwen3.5-4b_gguf
Author
daniloreddy
Pipeline Task
text-generation
Library
Created
2026-03-07
Last Modified
2026-03-14
Gated
No
Private
No
HF SHA
2ad48a1ed88ea20da2d5eef59d96ffee260cbcb0
License
apache-2.0
Language
Unknown
Base Model
Qwen/Qwen3.5-4B

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "base_model": "Qwen/Qwen3.5-4B",
    "tags": [
      "llama.cpp",
      "gguf",
      "quantized",
      "text-generation",
      "lightweight",
      "lmstudio",
      "jan",
      "cobalt",
      "text-generation-webui"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "base_model": "Qwen/Qwen3.5-4B",
      "tags": [
        "llama.cpp",
        "gguf",
        "quantized",
        "text-generation",
        "lightweight",
        "lmstudio",
        "jan",
        "cobalt",
        "text-generation-webui"
      ]
    },
    "hero_image_url": "",
    "summary": "This repository provides **GGUF** quantized versions of the Qwen/Qwen3.5-4B model, optimized for local execution using llama.cpp and compatible ecosystems.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nbase_model: Qwen/Qwen3.5-4B\ntags:\n- llama.cpp\n- gguf\n- quantized\n- text-generation\n- lightweight\n- lmstudio\n- jan\n- cobalt\n- text-generation-webui\n---\n\n# Qwen3.5-4B - GGUF High-Quality Quantizations\n\nThis repository provides **GGUF** quantized versions of the [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) model, optimized for local execution using `llama.cpp` and compatible ecosystems.\n\n## 📌 Version Notes\nAll quantizations were generated from the official **FP16** weights.\n- **Target:** Efficient execution on consumer hardware, mobile/edge devices, and systems with limited memory.\n- **Performance:** The output quality (reasoning, coherence, and accuracy) is strictly dependent on the base model's parameter scale (4B).\n\n## 📊 Quantization Table\n\n| File | Method | Bit | Description |\n| :--- | :--- | :--- | :--- |\n| **fp16.gguf** | FP16 | 16-bit | **Original Weights.** No quantization applied. Maximum fidelity. |\n| **Q8_0.gguf** | Q8_0 | 8-bit | **Near-lossless.** Practically identical to the original model with lower memory footprint. |\n| **Q5_K_M.gguf** | Q5_K_M | 5-bit | **High Precision.** Minimizes quantization error for critical tasks. |\n| **Q4_K_M.gguf** | Q4_K_M | 4-bit | **Recommended.** Best balance between speed and performance. |\n| **Q4_K_S.gguf** | Q4_K_S | 4-bit | **Fast/Small.** Optimized for maximum throughput and low RAM usage. |\n\n## 🛠️ Technical Details\n- **Quantization Date:** 2026-03-07\n- **Tool used:** `llama-quantize` (llama.cpp)\n- **Method:** K-Quantization (optimized for AVX2/AVX-512 and modern GPU architectures).\n\n## 🚀 How to Use\n# Start a local OpenAI-compatible server with a web UI:\n\n### llama.cpp (CLI) using model from HuggingFace\n```bash\n./llama-cli -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (CLI) using downloaded model\n```bash\n./llama-cli -m path/to/Qwen3.5-4B_Q4_K_M.gguf -p \"User: Hello! Assistant:\" -n 512 --temp 0.7\n```\n\n### llama.cpp (SERVER) using model from HuggingFace\n```bash\n./llama-server -hf daniloreddy/Qwen3.5-4B_GGUF:Q4_K_M --port 8080 -c 4096\n```\n\n### llama.cpp (SERVER) using downloaded model\n```bash\n./llama-server -m /path/to/Qwen3.5-4B_Q4_K_M.gguf --port 8080 -c 4096\n```",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "llama.cpp",
    "quantized",
    "text-generation",
    "lightweight",
    "lmstudio",
    "jan",
    "cobalt",
    "text-generation-webui",
    "base_model:Qwen/Qwen3.5-4B",
    "base_model:quantized:Qwen/Qwen3.5-4B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 259,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-14T17:45:39.000Z",
  "created_at": "2026-03-07T23:16:03.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69acb1b3e49df7e34954b40f",
  "id": "daniloreddy/Qwen3.5-4B_GGUF",
  "modelId": "daniloreddy/Qwen3.5-4B_GGUF",
  "sha": "2ad48a1ed88ea20da2d5eef59d96ffee260cbcb0",
  "createdAt": "2026-03-07T23:16:03.000Z",
  "lastModified": "2026-03-14T17:45:39.000Z",
  "author": "daniloreddy",
  "downloads": 259,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 9
}