FreedomAISVR/Ministral-3-14B-Instruct-2512-MXFP4-GGUF overview
Ministral 3 14B Instruct 2512 — MXFP4 GGUF MXFP4 quantization of mistralai/Ministral 3 14B Instruct 2512 https://huggingface.co/mistralai/Ministral 3 14B Instr…
Runs locally from ~837.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | FreedomAISVR/Ministral-3-14B-Instruct-2512-MXFP4-GGUF |
|---|---|
| Author | FreedomAISVR |
| Pipeline | — |
| License | apache-2.0 |
| Base model | mistralai/Ministral-3-14B-Instruct-2512 |
| Last modified | 2026-06-24T04:14:46.000Z |
Model README
---
language:
- en
- multilingual
tags:
- mistral
- ministral
- mxfp4
- gguf
- vision
- multimodal
- coding
- instruct
license: apache-2.0
base_model: mistralai/Ministral-3-14B-Instruct-2512
---
Ministral 3 14B Instruct-2512 — MXFP4 GGUF
MXFP4 quantization of mistralai/Ministral-3-14B-Instruct-2512, a 14B parameter coding and vision model from Mistral AI.
About the Model
Ministral 3 14B is a dense transformer with 40 layers, 5120 hidden dimension, and 24-layer Pixtral ViT vision encoder. It supports:
- Code generation and debugging across multiple languages
- Vision understanding via multimodal image input
- Tool calling with native function calling support
- 131K context window
- 393K maximum context length
Quantization
This GGUF was quantized from the FP8_E4M3 source weights using llama.cpp (build 537). The source safetensors were dequantized to F16 during conversion, then quantized to MXFP4 format.
MXFP4 (Microscaling FP4) uses block-wise quantization with shared exponents per block, providing better precision than standard FP4 for the same memory footprint.
Files
| File | Size | Description |
|------|------|-------------|
| ministral-3-14b-instruct-2512-mxfp4.gguf | ~6.9 GB | MXFP4 quantized model weights |
| mmproj-ministral-3-14b-instruct-2512-f16.gguf | ~878 MB | Vision projector (F16, unquantized) |
Usage
llama.cpp
# Server mode with OpenAI-compatible API
llama-server \
-m ministral-3-14b-instruct-2512-mxfp4.gguf \
--mmproj mmproj-ministral-3-14b-instruct-2512-f16.gguf \
-ngl 99 \
--host 0.0.0.0 \
--port 8080
# Direct inference
llama-cli \
-m ministral-3-14b-instruct-2512-mxfp4.gguf \
--mmproj mmproj-ministral-3-14b-instruct-2512-f16.gguf \
-ngl 99 \
-p "Write a Python function to compute fibonacci numbers"
LM Studio
- Download both files from this repository
- Load the main GGUF file in LM Studio
- Load the mmproj file for vision support
- Set GPU offload layers to maximum
Architecture
- Parameters: 14B (dense transformer)
- Layers: 40
- Hidden dimension: 5120
- Attention heads: 32 (8 KV heads for GQA)
- Vision encoder: 24-layer Pixtral ViT
- Context: 131K (native), 393K (extended)
- Vocabulary: Mistral Tekken tokenizer
Hardware Requirements
- Minimum: 8 GB VRAM for text-only, 10 GB for vision
- Recommended: 16 GB VRAM for full GPU offload
- Disk: ~7.8 GB for model + mmproj
Quantization Details
| Metric | Value |
|--------|-------|
| Source format | FP8_E4M3 (safetensors) |
| Intermediate | F16 GGUF |
| Output format | MXFP4 |
| Approximate BPW | ~4.3 |
| Quantized with | llama.cpp build 537 |
License
Apache 2.0 — same as the base model.
Run FreedomAISVR/Ministral-3-14B-Instruct-2512-MXFP4-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models