GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

cstr/uni-mumer-qwen3-vl-2b-GGUF overview

Uni MuMER Qwen3 VL 2B GGUF GGUF conversions of phxember/Uni MuMER Qwen3 VL 2B https://huggingface.co/phxember/Uni MuMER Qwen3 VL 2B for CrispEmbed https://gith…

ggufmath-ocrhandwritten-mathlatexhmervision-languagecrispembedggmlqwen3-vlenarxiv:2505.23566base_model:phxember/Uni-MuMER-Qwen3-VL-2Bbase_model:quantized:phxember/Uni-MuMER-Qwen3-VL-2Blicense:apache-2.0region:us

Runs locally from ~1.17 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
Author

Repository Files & Downloads

5 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
uni-mumer-qwen3-vl-2b-f16.ggufGGUFF164.55 GBDownload
uni-mumer-qwen3-vl-2b-q2_k.ggufGGUFQ2_K1.17 GBDownload
uni-mumer-qwen3-vl-2b-q3_k.ggufGGUFQ3_K1.30 GBDownload
uni-mumer-qwen3-vl-2b-q4_k.ggufGGUFQ4_K1.48 GBDownload
uni-mumer-qwen3-vl-2b-q8_0.ggufGGUFQ8_02.13 GBDownload

Model Details

Model IDcstr/uni-mumer-qwen3-vl-2b-GGUF
Authorcstr
Pipeline
Licenseapache-2.0
Base modelphxember/Uni-MuMER-Qwen3-VL-2B
Last modified2026-06-20T20:34:04.000Z

Model README

---

license: apache-2.0

base_model: phxember/Uni-MuMER-Qwen3-VL-2B

tags:

- math-ocr

- handwritten-math

- latex

- hmer

- vision-language

- gguf

- crispembed

- ggml

- qwen3-vl

language:

- en

---

Uni-MuMER-Qwen3-VL-2B GGUF

GGUF conversions of phxember/Uni-MuMER-Qwen3-VL-2B for CrispEmbed inference.

Handwritten Mathematical Expression Recognition (HMER) model. Converts images of handwritten math into LaTeX. Fine-tuned from Qwen3-VL-2B-Instruct using multi-task training (recognition + symbol counting + position identification).

Based on the Uni-MuMER paper (NeurIPS 2025 Spotlight).

Model variants

| File | Quant | Size | Notes |

|------|-------|------|-------|

| uni-mumer-qwen3-vl-2b-f16.gguf | F16 | 4.7 GB | Full precision |

| uni-mumer-qwen3-vl-2b-q8_0.gguf | Q8_0 | 2.2 GB | Recommended |

| uni-mumer-qwen3-vl-2b-q4_k.gguf | Q4_K | 1.5 GB | Max compression |

Architecture

  • Vision: Qwen3-VL ViT (24L, 1024d, 16 heads, patch=16, learned pos embed)
  • DeepStack: Multi-layer feature concat at layers [5, 11, 17]
  • Merger: LayerNorm + GELU MLP (4096d intermediate)
  • LLM: Qwen3 decoder (28L, 2048d, GQA 16/8, SwiGLU, interleaved mRoPE)
  • Parameters: 2.1B
  • mRoPE sections: [24, 20, 20]

Usage

# CLI — auto-detects math OCR prompt from model name
./crispembed -m uni-mumer-qwen3-vl-2b-q4_k.gguf --ocr equation.png

# Server
./crispembed-server --ocr uni-mumer-qwen3-vl-2b-q4_k.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@equation.png"
from crispembed import CrispMathOcr

ocr = CrispMathOcr("uni-mumer-qwen3-vl-2b-q4_k.gguf")
latex = ocr.recognize("equation.png")
print(latex)  # x ^ { 2 } + 2 x y + y ^ { 2 } = 0

Accuracy

  • ~82% ExpRate on CROHME (handwritten math benchmark)
  • Replaces CC BY-NC-SA models (PosFormer/BTTR/HMER at 57%) with Apache-2.0 licensed alternative

License

Apache-2.0 — same as the base model.

Credits

Original model by BFlameSwift/Uni-MuMER (phxember on HuggingFace). GGUF conversion and inference engine by CrispEmbed.

Run cstr/uni-mumer-qwen3-vl-2b-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models