cstr/uni-mumer-qwen3-vl-2b-GGUF overview
Uni MuMER Qwen3 VL 2B GGUF GGUF conversions of phxember/Uni MuMER Qwen3 VL 2B https://huggingface.co/phxember/Uni MuMER Qwen3 VL 2B for CrispEmbed https://gith…
Runs locally from ~1.17 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| uni-mumer-qwen3-vl-2b-f16.gguf | GGUF | F16 | 4.55 GB | Download |
| uni-mumer-qwen3-vl-2b-q2_k.gguf | GGUF | Q2_K | 1.17 GB | Download |
| uni-mumer-qwen3-vl-2b-q3_k.gguf | GGUF | Q3_K | 1.30 GB | Download |
| uni-mumer-qwen3-vl-2b-q4_k.gguf | GGUF | Q4_K | 1.48 GB | Download |
| uni-mumer-qwen3-vl-2b-q8_0.gguf | GGUF | Q8_0 | 2.13 GB | Download |
Model Details
Model README
---
license: apache-2.0
base_model: phxember/Uni-MuMER-Qwen3-VL-2B
tags:
- math-ocr
- handwritten-math
- latex
- hmer
- vision-language
- gguf
- crispembed
- ggml
- qwen3-vl
language:
- en
---
Uni-MuMER-Qwen3-VL-2B GGUF
GGUF conversions of phxember/Uni-MuMER-Qwen3-VL-2B for CrispEmbed inference.
Handwritten Mathematical Expression Recognition (HMER) model. Converts images of handwritten math into LaTeX. Fine-tuned from Qwen3-VL-2B-Instruct using multi-task training (recognition + symbol counting + position identification).
Based on the Uni-MuMER paper (NeurIPS 2025 Spotlight).
Model variants
| File | Quant | Size | Notes |
|------|-------|------|-------|
| uni-mumer-qwen3-vl-2b-f16.gguf | F16 | 4.7 GB | Full precision |
| uni-mumer-qwen3-vl-2b-q8_0.gguf | Q8_0 | 2.2 GB | Recommended |
| uni-mumer-qwen3-vl-2b-q4_k.gguf | Q4_K | 1.5 GB | Max compression |
Architecture
- Vision: Qwen3-VL ViT (24L, 1024d, 16 heads, patch=16, learned pos embed)
- DeepStack: Multi-layer feature concat at layers [5, 11, 17]
- Merger: LayerNorm + GELU MLP (4096d intermediate)
- LLM: Qwen3 decoder (28L, 2048d, GQA 16/8, SwiGLU, interleaved mRoPE)
- Parameters: 2.1B
- mRoPE sections: [24, 20, 20]
Usage
# CLI — auto-detects math OCR prompt from model name
./crispembed -m uni-mumer-qwen3-vl-2b-q4_k.gguf --ocr equation.png
# Server
./crispembed-server --ocr uni-mumer-qwen3-vl-2b-q4_k.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@equation.png"
from crispembed import CrispMathOcr
ocr = CrispMathOcr("uni-mumer-qwen3-vl-2b-q4_k.gguf")
latex = ocr.recognize("equation.png")
print(latex) # x ^ { 2 } + 2 x y + y ^ { 2 } = 0
Accuracy
- ~82% ExpRate on CROHME (handwritten math benchmark)
- Replaces CC BY-NC-SA models (PosFormer/BTTR/HMER at 57%) with Apache-2.0 licensed alternative
License
Apache-2.0 — same as the base model.
Credits
Original model by BFlameSwift/Uni-MuMER (phxember on HuggingFace). GGUF conversion and inference engine by CrispEmbed.
Run cstr/uni-mumer-qwen3-vl-2b-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models