Question 1

What is cstr/qwen3-vl-2b-crispembed-gguf?

Accepted Answer

---

license: apache-2.0

tags:

- gguf

- ocr

- vision-language

- crispembed

---

Qwen3-VL-2B — CrispEmbed GGUF

Models

|------|-------|------|-------------|

| qwen3-vl-2b-q4_k.gguf | Q4_K | 1.5 GB | Good quality/size balance |

| qwen3-vl-2b-q8_0.gguf | Q8_0 | 2.2 GB | Best quality |

DeepStack vision fusion: intermediate vision-encoder features injected into LLM decoder layers
Fused flash attention: uses ggml_flash_attn_ext for efficient inference
Backend KV cache: decode stays on GPU (Metal/CUDA), no per-token CPU transfer
Interleaved mRoPE: improved position encoding vs Qwen2.5-VL
QK RMSNorm: per-head query/key normalization

Converted with from CrispEmbed.

Question 2

What license applies to cstr/qwen3-vl-2b-crispembed-gguf?

Accepted Answer

License: apache-2.0. Verify terms on Hugging Face before commercial use.

Question 3

How do I run cstr/qwen3-vl-2b-crispembed-gguf locally?

Accepted Answer

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Question 4

How much VRAM or disk space does cstr/qwen3-vl-2b-crispembed-gguf need?

Accepted Answer

Runs locally from ~37.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

File	Type	Quantization	Size	Link
qwen3-vl-2b-diff-ref.gguf	GGUF	GGUF	37.4 MB	Download
qwen3-vl-2b-f16.gguf	GGUF	F16	4.55 GB	Download
qwen3-vl-2b-q4_k.gguf	GGUF	Q4_K	1.48 GB	Download
qwen3-vl-2b-q8_0.gguf	GGUF	Q8_0	2.13 GB	Download

Model ID	cstr/qwen3-vl-2b-crispembed-gguf
Author	cstr
Pipeline	—
License	apache-2.0
Base model	—
Last modified	2026-06-20T04:50:47.000Z