GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

cstr/uni-mumer-qwen2.5-vl-3b-GGUF overview

Uni MuMER Qwen2.5 VL 3B GGUF GGUF conversions of phxember/Uni MuMER Qwen2.5 VL 3B https://huggingface.co/phxember/Uni MuMER Qwen2.5 VL 3B for CrispEmbed https:…

ggufmath-ocrhandwritten-mathlatexhmervision-languagecrispembedggmlqwen2.5-vlenarxiv:2505.23566base_model:phxember/Uni-MuMER-Qwen2.5-VL-3Bbase_model:quantized:phxember/Uni-MuMER-Qwen2.5-VL-3Blicense:apache-2.0region:us

Runs locally from ~2.75 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
Author

Repository Files & Downloads

3 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
uni-mumer-qwen2.5-vl-3b-f16.ggufGGUFF168.74 GBDownload
uni-mumer-qwen2.5-vl-3b-q4_k.ggufGGUFQ4_K2.75 GBDownload
uni-mumer-qwen2.5-vl-3b-q8_0.ggufGGUFQ8_04.19 GBDownload

Model Details

Model IDcstr/uni-mumer-qwen2.5-vl-3b-GGUF
Authorcstr
Pipeline
Licenseapache-2.0
Base modelphxember/Uni-MuMER-Qwen2.5-VL-3B
Last modified2026-06-20T18:11:45.000Z

Model README

---

license: apache-2.0

base_model: phxember/Uni-MuMER-Qwen2.5-VL-3B

tags:

- math-ocr

- handwritten-math

- latex

- hmer

- vision-language

- gguf

- crispembed

- ggml

- qwen2.5-vl

language:

- en

---

Uni-MuMER-Qwen2.5-VL-3B GGUF

GGUF conversions of phxember/Uni-MuMER-Qwen2.5-VL-3B for CrispEmbed inference.

Handwritten Mathematical Expression Recognition (HMER) model. Converts images of handwritten math into LaTeX. Fine-tuned from Qwen2.5-VL-3B-Instruct using multi-task training (recognition + symbol counting + position identification).

Based on the Uni-MuMER paper (NeurIPS 2025 Spotlight). This is the highest-accuracy variant at 82.25% CROHME.

Model variants

| File | Quant | Size | Notes |

|------|-------|------|-------|

| uni-mumer-qwen2.5-vl-3b-f16.gguf | F16 | 8.9 GB | Full precision |

| uni-mumer-qwen2.5-vl-3b-q8_0.gguf | Q8_0 | 4.2 GB | Recommended |

| uni-mumer-qwen2.5-vl-3b-q4_k.gguf | Q4_K | 2.6 GB | Max compression |

Architecture

  • Vision: Qwen2.5-VL ViT (32L, 1280d, 16 heads, patch=14, windowed attn)
  • Merger: RMSNorm + SwiGLU MLP (4:1 spatial merge)
  • LLM: Qwen2.5 decoder (36L, 2048d, GQA 16/2, SwiGLU, mRoPE)
  • Parameters: 3.4B
  • mRoPE sections: [16, 24, 24]

Usage

# CLI — auto-detects math OCR prompt from model name
./crispembed -m uni-mumer-qwen2.5-vl-3b-q4_k.gguf --ocr equation.png

# Server
./crispembed-server --ocr uni-mumer-qwen2.5-vl-3b-q4_k.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@equation.png"
from crispembed import CrispMathOcr

ocr = CrispMathOcr("uni-mumer-qwen2.5-vl-3b-q4_k.gguf")
latex = ocr.recognize("equation.png")
print(latex)  # x ^ { 2 } + 2 x y + y ^ { 2 } = 0

Accuracy

  • 82.25% ExpRate on CROHME (handwritten math benchmark)
  • Highest accuracy among the Uni-MuMER variants
  • Replaces CC BY-NC-SA models (PosFormer/BTTR/HMER at 57%) with Apache-2.0 licensed alternative

License

Apache-2.0 — same as the base model.

Credits

Original model by BFlameSwift/Uni-MuMER (phxember on HuggingFace). GGUF conversion and inference engine by CrispEmbed.

Run cstr/uni-mumer-qwen2.5-vl-3b-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models