Model Intelligence Sheet
cstr/qwen3-vl-2b-crispembed-gguf overview
Qwen3 VL 2B — CrispEmbed GGUF GGUF conversions of Qwen/Qwen3 VL 2B Instruct https://huggingface.co/Qwen/Qwen3 VL 2B Instruct for use with CrispEmbed https://gi…
Runs locally from ~37.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
Model README
---
license: apache-2.0
tags:
- gguf
- ocr
- vision-language
- crispembed
---
Qwen3-VL-2B — CrispEmbed GGUF
GGUF conversions of Qwen/Qwen3-VL-2B-Instruct for use with CrispEmbed.
Models
| File | Quant | Size | Description |
|------|-------|------|-------------|
| qwen3-vl-2b-q4_k.gguf | Q4_K | 1.5 GB | Good quality/size balance |
| qwen3-vl-2b-q8_0.gguf | Q8_0 | 2.2 GB | Best quality |
Features
- DeepStack vision fusion: intermediate vision-encoder features injected into LLM decoder layers
- Fused flash attention: uses ggml_flash_attn_ext for efficient inference
- Backend KV cache: decode stays on GPU (Metal/CUDA), no per-token CPU transfer
- Interleaved mRoPE: improved position encoding vs Qwen2.5-VL
- QK RMSNorm: per-head query/key normalization
Usage
0 "<stdin>"
0 "<built-in>"
0 "<command-line>"
1 "/usr/include/stdc-predef.h" 1 3 4
0 "<command-line>" 2
1 "<stdin>"
Converted with from CrispEmbed.
Run cstr/qwen3-vl-2b-crispembed-gguf with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models