cstr/uni-mumer-qwen2.5-vl-3b-GGUF overview
Uni MuMER Qwen2.5 VL 3B GGUF GGUF conversions of phxember/Uni MuMER Qwen2.5 VL 3B https://huggingface.co/phxember/Uni MuMER Qwen2.5 VL 3B for CrispEmbed https:…
Runs locally from ~2.75 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
Model README
---
license: apache-2.0
base_model: phxember/Uni-MuMER-Qwen2.5-VL-3B
tags:
- math-ocr
- handwritten-math
- latex
- hmer
- vision-language
- gguf
- crispembed
- ggml
- qwen2.5-vl
language:
- en
---
Uni-MuMER-Qwen2.5-VL-3B GGUF
GGUF conversions of phxember/Uni-MuMER-Qwen2.5-VL-3B for CrispEmbed inference.
Handwritten Mathematical Expression Recognition (HMER) model. Converts images of handwritten math into LaTeX. Fine-tuned from Qwen2.5-VL-3B-Instruct using multi-task training (recognition + symbol counting + position identification).
Based on the Uni-MuMER paper (NeurIPS 2025 Spotlight). This is the highest-accuracy variant at 82.25% CROHME.
Model variants
| File | Quant | Size | Notes |
|------|-------|------|-------|
| uni-mumer-qwen2.5-vl-3b-f16.gguf | F16 | 8.9 GB | Full precision |
| uni-mumer-qwen2.5-vl-3b-q8_0.gguf | Q8_0 | 4.2 GB | Recommended |
| uni-mumer-qwen2.5-vl-3b-q4_k.gguf | Q4_K | 2.6 GB | Max compression |
Architecture
- Vision: Qwen2.5-VL ViT (32L, 1280d, 16 heads, patch=14, windowed attn)
- Merger: RMSNorm + SwiGLU MLP (4:1 spatial merge)
- LLM: Qwen2.5 decoder (36L, 2048d, GQA 16/2, SwiGLU, mRoPE)
- Parameters: 3.4B
- mRoPE sections: [16, 24, 24]
Usage
# CLI — auto-detects math OCR prompt from model name
./crispembed -m uni-mumer-qwen2.5-vl-3b-q4_k.gguf --ocr equation.png
# Server
./crispembed-server --ocr uni-mumer-qwen2.5-vl-3b-q4_k.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@equation.png"
from crispembed import CrispMathOcr
ocr = CrispMathOcr("uni-mumer-qwen2.5-vl-3b-q4_k.gguf")
latex = ocr.recognize("equation.png")
print(latex) # x ^ { 2 } + 2 x y + y ^ { 2 } = 0
Accuracy
- 82.25% ExpRate on CROHME (handwritten math benchmark)
- Highest accuracy among the Uni-MuMER variants
- Replaces CC BY-NC-SA models (PosFormer/BTTR/HMER at 57%) with Apache-2.0 licensed alternative
License
Apache-2.0 — same as the base model.
Credits
Original model by BFlameSwift/Uni-MuMER (phxember on HuggingFace). GGUF conversion and inference engine by CrispEmbed.
Run cstr/uni-mumer-qwen2.5-vl-3b-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models