GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

Dhptl/Qwen2.5-VL-3B-Instruct-GGUF overview

license: other license name: qwen research license link: https://huggingface.co/Qwen/Qwen2.5 VL 3B Instruct/blob/main/LICENSE base model: Qwen/Qwen2.5 VL 3B In…

transformersggufarxiv:2308.12966entext-generation-inferencearxiv:2309.00071arxiv:2409.12191conversationalmultimodalimage-text-to-textvlmsafetensorsregion:usqwen2_5_vlquantizedvisioneval-resultsdeploy:azurebase_model:Qwen/Qwen2.5-VL-3B-Instructbase_model:quantized:Qwen/Qwen2.5-VL-3B-Instructlicense:otherendpoints_compatible

Runs locally from ~1.19 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
image-text-to-text
Author

Repository Files & Downloads

11 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Qwen2.5-VL-3B-Instruct-Q2_K.ggufGGUFQ2_K1.19 GBDownload
Qwen2.5-VL-3B-Instruct-Q3_K_L.ggufGGUFQ3_K_L1.59 GBDownload
Qwen2.5-VL-3B-Instruct-Q3_K_M.ggufGGUFQ3_K_M1.48 GBDownload
Qwen2.5-VL-3B-Instruct-Q3_K_S.ggufGGUFQ3_K_S1.35 GBDownload
Qwen2.5-VL-3B-Instruct-Q4_K_M.ggufGGUFQ4_K_M1.80 GBDownload
Qwen2.5-VL-3B-Instruct-Q4_K_S.ggufGGUFQ4_K_S1.71 GBDownload
Qwen2.5-VL-3B-Instruct-Q5_K_M.ggufGGUFQ5_K_M2.07 GBDownload
Qwen2.5-VL-3B-Instruct-Q5_K_S.ggufGGUFQ5_K_S2.02 GBDownload
Qwen2.5-VL-3B-Instruct-Q6_K.ggufGGUFQ6_K2.36 GBDownload
Qwen2.5-VL-3B-Instruct-Q8_0.ggufGGUFQ8_03.06 GBDownload
Qwen2.5-VL-3B-Instruct-mmproj-f16.ggufGGUFF161.25 GBDownload

Model Details

Model IDDhptl/Qwen2.5-VL-3B-Instruct-GGUF
AuthorDhptl
Pipelineimage-text-to-text
Licenseother
Base modelQwen/Qwen2.5-VL-3B-Instruct
Last modified2026-06-11T05:09:35.000Z

Model README

---

license: other

license_name: qwen-research

license_link: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE

base_model: Qwen/Qwen2.5-VL-3B-Instruct

pipeline_tag: image-text-to-text

tags:

- arxiv:2308.12966

- en

- text-generation-inference

- arxiv:2309.00071

- arxiv:2409.12191

- conversational

- multimodal

- image-text-to-text

- vlm

- safetensors

- region:us

- qwen2_5_vl

- quantized

- transformers

- vision

- eval-results

- gguf

- deploy:azure

language:

- en

---

<div align="center">

Qwen2.5-VL-3B-Instruct — GGUF Quantizations (VLM)

![Model on HF](https://huggingface.co/Dhptl/Qwen2.5-VL-3B-Instruct-GGUF)

![Original Model](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)

![quant-kit](https://github.com/DhruvalPtl/quant-kit)

Quantized GGUF versions of Qwen/Qwen2.5-VL-3B-Instruct

This is a Vision-Language Model (VLM) — it can understand both text and images.

Works with llama.cpp · LM Studio · Jan · Ollama

Quantized by Dhptl on June 11, 2026 using quant-kit

</div>

---

> [!IMPORTANT]

> This VLM requires TWO files — a text backbone GGUF and the mmproj vision encoder GGUF.

> Download one text backbone (e.g. Q4_K_M) and the mmproj file. Both must be in the same folder.

---

📦 Available Files

🔤 Text Backbone (quantized — pick ONE)

| Filename | Size | RAM Required | Quant | Quality | Best For |

|---|---|---|---|---|---|

| Qwen2.5-VL-3B-Instruct-Q2_K.gguf | 1.19 GB | ~2.7 GB | Q2_K | ⭐ | Extreme compression, significant quality loss. |

| Qwen2.5-VL-3B-Instruct-Q3_K_L.gguf | 1.59 GB | ~3.1 GB | Q3_K_L | ⭐⭐⭐ | Slightly better than Q3_K_M, still a compromise. |

| Qwen2.5-VL-3B-Instruct-Q3_K_M.gguf | 1.48 GB | ~3.0 GB | Q3_K_M | ⭐⭐⭐ | Very small file. Quality drop noticeable. |

| Qwen2.5-VL-3B-Instruct-Q3_K_S.gguf | 1.35 GB | ~2.9 GB | Q3_K_S | ⭐⭐ | Very high compression, high quality loss. |

| Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf | 1.80 GB | ~3.3 GB | Q4_K_MRecommended | ⭐⭐⭐⭐ | Best balance of size and quality. Recommended for most users. |

| Qwen2.5-VL-3B-Instruct-Q4_K_S.gguf | 1.71 GB | ~3.2 GB | Q4_K_S | ⭐⭐⭐½ | Good speed/size balance, slight quality loss. |

| Qwen2.5-VL-3B-Instruct-Q5_K_M.gguf | 2.07 GB | ~3.6 GB | Q5_K_M | ⭐⭐⭐⭐½ | Better quality than Q4, slightly larger. Great if you have the RAM. |

| Qwen2.5-VL-3B-Instruct-Q5_K_S.gguf | 2.02 GB | ~3.5 GB | Q5_K_S | ⭐⭐⭐⭐ | Large but accurate. |

| Qwen2.5-VL-3B-Instruct-Q6_K.gguf | 2.36 GB | ~3.9 GB | Q6_K | ⭐⭐⭐⭐⭐ | Near-perfect quality, very large. |

| Qwen2.5-VL-3B-Instruct-Q8_0.gguf | 3.06 GB | ~4.6 GB | Q8_0 | ⭐⭐⭐⭐⭐ | Closest to original quality. Use when RAM is not a concern. |

🖼️ Vision Encoder — mmproj (always required, always F16)

| Filename | Size | Notes |

|---|---|---|

| Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf | 1.25 GB | Always F16 — vision encoder is not quantized |

> ⚠️ You need BOTH files — one text backbone + the mmproj — to run this VLM.

---

⚡ Speed Benchmarks

Run python benchmark.py --model Qwen2.5-VL-3B-Instruct to generate results.

---

🚀 How to Use

LM Studio (Easiest — GUI)

  1. Search for Dhptl/Qwen2.5-VL-3B-Instruct in LM Studio
  2. Download the Q4_K_M text file and the mmproj file
  3. Load the model — LM Studio automatically uses both files

Ollama

ollama run dhptl/qwen2.5-vl-3b-instruct

llama.cpp CLI — Text + Image

# Download both files to the same directory, then:
./llama-llava-cli \
  -m Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf \
  --mmproj Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf \
  --image /path/to/your/image.jpg \
  -p "Describe this image in detail." \
  -n 512

llama.cpp CLI — Text only (no image)

./llama-cli \
  -m Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf \
  -p "You are a helpful assistant." \
  --conversation

Python — llama-cpp-python

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava16ChatHandler

# Load VLM with mmproj
chat_handler = Llava16ChatHandler(clip_model_path="./Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf")
llm = Llama(
    model_path="./Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf",
    chat_handler=chat_handler,
    n_gpu_layers=-1,
    n_ctx=4096,
    logits_all=True,
)

# Text + image inference
response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
                {"type": "text",      "text":      "What do you see in this image?"}
            ]
        }
    ]
)
print(response["choices"][0]["message"]["content"])

---

🔍 VLM Architecture

This model uses a two-component architecture:

| Component | File | Purpose |

|---|---|---|

| Text Backbone | Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf | Language understanding & generation |

| Vision Encoder (mmproj) | Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf | Image feature extraction (always F16) |

> Why is mmproj always F16?

> The vision encoder maps image pixels to token embeddings. Quantizing it causes

> visible visual artifacts and degraded image understanding. It stays at F16 (half precision)

> which is already very efficient at ~1-2GB for most models.

---

🔍 About GGUF Quantization

| Format | Bits/weight | Quality |

|---|---|---|

| Q3_K_M | ~3.3 | ⭐⭐⭐ |

| Q4_K_M | ~4.5 | ⭐⭐⭐⭐ ← recommended |

| Q5_K_M | ~5.6 | ⭐⭐⭐⭐½ |

| Q8_0 | ~8.5 | ⭐⭐⭐⭐⭐ |

---

💬 Community & Feedback

Found an issue? Open a Discussion in the Community tab.

If useful, please:

  • ⭐ Star quant-kit on GitHub
  • 👍 Like this model on HuggingFace

Run Dhptl/Qwen2.5-VL-3B-Instruct-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models