GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

cstr/smoldocling-GGUF overview

SmolDocling 256M GGUF GGUF conversions of ds4sd/SmolDocling 256M preview https://huggingface.co/ds4sd/SmolDocling 256M preview for CrispEmbed https://github.co…

ggufocrdocument-understandingdoctagsvision-languagecrispembedggmlenbase_model:docling-project/SmolDocling-256M-previewbase_model:quantized:docling-project/SmolDocling-256M-previewlicense:apache-2.0region:us

Runs locally from ~154.2 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
Author

Repository Files & Downloads

3 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
smoldocling-f16.ggufGGUFF16490.9 MBDownload
smoldocling-q4_k.ggufGGUFQ4_K154.2 MBDownload
smoldocling-q8_0.ggufGGUFQ8_0262.3 MBDownload

Model Details

Model IDcstr/smoldocling-GGUF
Authorcstr
Pipeline
Licenseapache-2.0
Base modelds4sd/SmolDocling-256M-preview
Last modified2026-06-19T13:22:04.000Z

Model README

---

license: apache-2.0

base_model: ds4sd/SmolDocling-256M-preview

tags:

- ocr

- document-understanding

- doctags

- vision-language

- gguf

- crispembed

- ggml

language:

- en

---

SmolDocling-256M GGUF

GGUF conversions of ds4sd/SmolDocling-256M-preview for CrispEmbed inference.

Ultra-compact document conversion model (256M params). Generates DocTags structured markup from page images — OCR, layout, tables, formulas, code, charts.

Model variants

| File | Quant | Size | Notes |

|------|-------|------|-------|

| smoldocling-f16.gguf | F16 | 491 MB | Full precision |

| smoldocling-q8_0.gguf | Q8_0 | 261 MB | Recommended |

| smoldocling-q4_k.gguf | Q4_K | 153 MB | Max compression |

Architecture

  • Vision: SigLIP ViT (12L, 768d, 12 heads, patch=16, 512px)
  • Connector: Pixel shuffle (scale=4, 1024→64 tokens) + Linear(12288→576)
  • LLM: SmolLM2-135M (30L, 576d, GQA 9/3, SwiGLU, RoPE)
  • Parameters: 256M total (93M vision + 135M LLM + connector)
  • Output: DocTags (structured XML-like document markup)

Parity vs HF reference: vision cos=0.9998, connector cos=0.9999.

Usage

# CLI
./crispembed -m smoldocling-q8_0.gguf --ocr document.png

# Server
./crispembed-server --ocr smoldocling-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@document.png"
from crispembed import CrispMathOcr

ocr = CrispMathOcr("smoldocling-q8_0.gguf")
doctags = ocr.recognize("document.png")
print(doctags)  # <doctag><text>...</text>...</doctag>

License

Apache-2.0 — same as the base model.

Credits

Original model by Docling Team, IBM Research. GGUF conversion and inference engine by CrispEmbed.

Run cstr/smoldocling-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models