DeepSeek-OCR2 — CrispEmbed GGUF

GGUF conversion of deepseek-ai/DeepSeek-OCR-2 for use with CrispEmbed.

Architecture

SAM-ViT-B (12L, 768d) → Qwen2 encoder (24L, 896d, bidirectional) → Linear projector (896→1280) → DeepSeek-V2 MoE decoder (12L, 1280d, 64 experts top-6 + 2 shared, layer 0 dense) → lm_head

Models

|------|-------|------|-------------|

| deepseek-ocr2-f16.gguf | F16 | 6.4 GB | Full precision |

| deepseek-ocr2-q8_0.gguf | Q8_0 | ~3.4 GB | Best quality/size balance |

| deepseek-ocr2-q4_k.gguf | Q4_K | ~2.0 GB | Smallest, good quality |

Performance features

Per-row embedding dequant (saves ~655 MB peak RSS vs full table expansion)
MoE decoder on Metal via ggml_mul_mat_id
SAM patch-embed + neck on Metal via ggml_conv_2d
Qwen2 encoder on Metal graph

Converted with models/convert-deepseek-ocr2-to-gguf.py from CrispEmbed.

Question 2

What license applies to cstr/deepseek-ocr2-crispembed-GGUF?

Accepted Answer

License: apache-2.0. Verify terms on Hugging Face before commercial use.

Question 3

How do I run cstr/deepseek-ocr2-crispembed-GGUF locally?

Accepted Answer

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Question 4

How much VRAM or disk space does cstr/deepseek-ocr2-crispembed-GGUF need?

Accepted Answer

Runs locally from ~2.07 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

cstr/deepseek-ocr2-crispembed-GGUF overview

Repository Files & Downloads

Model Details

Model README

DeepSeek-OCR2 — CrispEmbed GGUF

Architecture

Models

Performance features

Run cstr/deepseek-ocr2-crispembed-GGUF with guIDE

File	Type	Quantization	Size	Link
deepseek-ocr2-f16.gguf	GGUF	F16	6.32 GB	Download
deepseek-ocr2-q4_k.gguf	GGUF	Q4_K	2.07 GB	Download
deepseek-ocr2-q8_0.gguf	GGUF	Q8_0	3.36 GB	Download

Model ID	cstr/deepseek-ocr2-crispembed-GGUF
Author	cstr
Pipeline	—
License	apache-2.0
Base model	—
Last modified	2026-06-20T05:04:54.000Z