Dhptl/Qwen2.5-VL-3B-Instruct-GGUF overview
license: other license name: qwen research license link: https://huggingface.co/Qwen/Qwen2.5 VL 3B Instruct/blob/main/LICENSE base model: Qwen/Qwen2.5 VL 3B In…
Runs locally from ~1.19 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen2.5-VL-3B-Instruct-Q2_K.gguf | GGUF | Q2_K | 1.19 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q3_K_L.gguf | GGUF | Q3_K_L | 1.59 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q3_K_M.gguf | GGUF | Q3_K_M | 1.48 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q3_K_S.gguf | GGUF | Q3_K_S | 1.35 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf | GGUF | Q4_K_M | 1.80 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q4_K_S.gguf | GGUF | Q4_K_S | 1.71 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q5_K_M.gguf | GGUF | Q5_K_M | 2.07 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q5_K_S.gguf | GGUF | Q5_K_S | 2.02 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q6_K.gguf | GGUF | Q6_K | 2.36 GB | Download |
| Qwen2.5-VL-3B-Instruct-Q8_0.gguf | GGUF | Q8_0 | 3.06 GB | Download |
| Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf | GGUF | F16 | 1.25 GB | Download |
Model Details
| Model ID | Dhptl/Qwen2.5-VL-3B-Instruct-GGUF |
|---|---|
| Author | Dhptl |
| Pipeline | image-text-to-text |
| License | other |
| Base model | Qwen/Qwen2.5-VL-3B-Instruct |
| Last modified | 2026-06-11T05:09:35.000Z |
Model README
---
license: other
license_name: qwen-research
license_link: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE
base_model: Qwen/Qwen2.5-VL-3B-Instruct
pipeline_tag: image-text-to-text
tags:
- arxiv:2308.12966
- en
- text-generation-inference
- arxiv:2309.00071
- arxiv:2409.12191
- conversational
- multimodal
- image-text-to-text
- vlm
- safetensors
- region:us
- qwen2_5_vl
- quantized
- transformers
- vision
- eval-results
- gguf
- deploy:azure
language:
- en
---
<div align="center">
Qwen2.5-VL-3B-Instruct — GGUF Quantizations (VLM)



Quantized GGUF versions of Qwen/Qwen2.5-VL-3B-Instruct
This is a Vision-Language Model (VLM) — it can understand both text and images.
Works with llama.cpp · LM Studio · Jan · Ollama
Quantized by Dhptl on June 11, 2026 using quant-kit
</div>
---
> [!IMPORTANT]
> This VLM requires TWO files — a text backbone GGUF and the mmproj vision encoder GGUF.
> Download one text backbone (e.g. Q4_K_M) and the mmproj file. Both must be in the same folder.
---
📦 Available Files
🔤 Text Backbone (quantized — pick ONE)
| Filename | Size | RAM Required | Quant | Quality | Best For |
|---|---|---|---|---|---|
| Qwen2.5-VL-3B-Instruct-Q2_K.gguf | 1.19 GB | ~2.7 GB | Q2_K | ⭐ | Extreme compression, significant quality loss. |
| Qwen2.5-VL-3B-Instruct-Q3_K_L.gguf | 1.59 GB | ~3.1 GB | Q3_K_L | ⭐⭐⭐ | Slightly better than Q3_K_M, still a compromise. |
| Qwen2.5-VL-3B-Instruct-Q3_K_M.gguf | 1.48 GB | ~3.0 GB | Q3_K_M | ⭐⭐⭐ | Very small file. Quality drop noticeable. |
| Qwen2.5-VL-3B-Instruct-Q3_K_S.gguf | 1.35 GB | ~2.9 GB | Q3_K_S | ⭐⭐ | Very high compression, high quality loss. |
| Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf | 1.80 GB | ~3.3 GB | Q4_K_M ✅ Recommended | ⭐⭐⭐⭐ | Best balance of size and quality. Recommended for most users. |
| Qwen2.5-VL-3B-Instruct-Q4_K_S.gguf | 1.71 GB | ~3.2 GB | Q4_K_S | ⭐⭐⭐½ | Good speed/size balance, slight quality loss. |
| Qwen2.5-VL-3B-Instruct-Q5_K_M.gguf | 2.07 GB | ~3.6 GB | Q5_K_M | ⭐⭐⭐⭐½ | Better quality than Q4, slightly larger. Great if you have the RAM. |
| Qwen2.5-VL-3B-Instruct-Q5_K_S.gguf | 2.02 GB | ~3.5 GB | Q5_K_S | ⭐⭐⭐⭐ | Large but accurate. |
| Qwen2.5-VL-3B-Instruct-Q6_K.gguf | 2.36 GB | ~3.9 GB | Q6_K | ⭐⭐⭐⭐⭐ | Near-perfect quality, very large. |
| Qwen2.5-VL-3B-Instruct-Q8_0.gguf | 3.06 GB | ~4.6 GB | Q8_0 | ⭐⭐⭐⭐⭐ | Closest to original quality. Use when RAM is not a concern. |
🖼️ Vision Encoder — mmproj (always required, always F16)
| Filename | Size | Notes |
|---|---|---|
| Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf | 1.25 GB | Always F16 — vision encoder is not quantized |
> ⚠️ You need BOTH files — one text backbone + the mmproj — to run this VLM.
---
⚡ Speed Benchmarks
Run python benchmark.py --model Qwen2.5-VL-3B-Instruct to generate results.
---
🚀 How to Use
LM Studio (Easiest — GUI)
- Search for
Dhptl/Qwen2.5-VL-3B-Instructin LM Studio - Download the Q4_K_M text file and the mmproj file
- Load the model — LM Studio automatically uses both files
Ollama
ollama run dhptl/qwen2.5-vl-3b-instruct
llama.cpp CLI — Text + Image
# Download both files to the same directory, then:
./llama-llava-cli \
-m Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf \
--mmproj Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf \
--image /path/to/your/image.jpg \
-p "Describe this image in detail." \
-n 512
llama.cpp CLI — Text only (no image)
./llama-cli \
-m Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf \
-p "You are a helpful assistant." \
--conversation
Python — llama-cpp-python
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava16ChatHandler
# Load VLM with mmproj
chat_handler = Llava16ChatHandler(clip_model_path="./Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf")
llm = Llama(
model_path="./Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf",
chat_handler=chat_handler,
n_gpu_layers=-1,
n_ctx=4096,
logits_all=True,
)
# Text + image inference
response = llm.create_chat_completion(
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
{"type": "text", "text": "What do you see in this image?"}
]
}
]
)
print(response["choices"][0]["message"]["content"])
---
🔍 VLM Architecture
This model uses a two-component architecture:
| Component | File | Purpose |
|---|---|---|
| Text Backbone | Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf | Language understanding & generation |
| Vision Encoder (mmproj) | Qwen2.5-VL-3B-Instruct-mmproj-f16.gguf | Image feature extraction (always F16) |
> Why is mmproj always F16?
> The vision encoder maps image pixels to token embeddings. Quantizing it causes
> visible visual artifacts and degraded image understanding. It stays at F16 (half precision)
> which is already very efficient at ~1-2GB for most models.
---
🔍 About GGUF Quantization
| Format | Bits/weight | Quality |
|---|---|---|
| Q3_K_M | ~3.3 | ⭐⭐⭐ |
| Q4_K_M | ~4.5 | ⭐⭐⭐⭐ ← recommended |
| Q5_K_M | ~5.6 | ⭐⭐⭐⭐½ |
| Q8_0 | ~8.5 | ⭐⭐⭐⭐⭐ |
---
💬 Community & Feedback
Found an issue? Open a Discussion in the Community tab.
If useful, please:
- ⭐ Star quant-kit on GitHub
- 👍 Like this model on HuggingFace
Run Dhptl/Qwen2.5-VL-3B-Instruct-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models