Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF overview
๐ gemma 4 31b it โ Imatrix GGUF Verified quants only. Sub 4 bit variants are excluded from this release because they produce degenerate output on this model sโฆ
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Browse files on Hugging Face | ||||
Model Details
| Model ID | Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF |
|---|---|
| Author | Krasnopjorovs |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | google/gemma-4-31B-it |
| Last modified | 2026-06-06T20:01:47.000Z |
Model README
---
license: apache-2.0
base_model: google/gemma-4-31B-it
pipeline_tag: text-generation
library_name: gguf
tags:
- gguf
- imatrix
- llama.cpp
- quantized
- verified
quantized_by: Krasnopjorovs
---
๐ gemma-4-31b-it โ Imatrix GGUF
> Verified quants only. Sub-4-bit variants are excluded from this release because they produce degenerate output on this model size โ no point shipping broken files.
GGUF imatrix builds of google/gemma-4-31B-it.
Built with llama.cpp and importance-matrix calibration on a public multilingual + code + math corpus. Every quant is loaded, prompted, and visually checked before publication.
---
๐ From the author
Hi local LLM enthusiasts! You may have noticed a few quants from me over the past months โ those were one-off experiments. This release is the first one from a fully automated pipeline.
Please try them out, share feedback, and let me know which quants you find useful or which ones you wish I had made. For now, anything below 4-bit produces degraded output on this architecture, so I am holding off on those until they actually work well. More quants and more models coming.
---
๐ฏ Pick a quant
| Quant | Size | What it's for |
|---|---|---|
| Q8_0 | 31G | Almost the original. Pick this if RAM isn't a concern. |
| Q6_K_L | 24G | Near-lossless with Q8_0 embeddings. Best of the K-quants. |
| Q6_K | 24G | Near-lossless. Excellent fidelity at smaller size than Q8_0. |
| Q5_K_L | 21G | Q5_K_M with Q8_0 embeddings. High quality, small overhead. |
| Q5_K_M | 21G | Sweet spot between Q4 and Q6. Solid all-rounder. |
| Q5_K_S | 20G | Slightly smaller than Q5_K_M, virtually identical output. |
| Q4_K_L | 18G | Q4_K_M with Q8_0 embeddings. The smart 4-bit choice. |
| Q4_K_M | 18G | The default. Most-downloaded quant for a reason. |
| Q4_K_S | 17G | A bit smaller than Q4_K_M, a tiny step down. |
| IQ4_NL | 17G | ARM-optimized 4-bit. For Raspberry Pi & friends. |
| IQ4_XS | 16G | Tightest 4-bit format. Quality close to Q4_K_S, smaller. |
*_L variants override the output tensor and embeddings to Q8_0 โ small disk cost, better output stability.
Tl;dr:
- ๐ข Q4_K_M โ start here unless you have a reason not to
- ๐ก Q5_K_M โ small quality bump if you have RAM to spare
- ๐ต Q8_0 โ pick this if you're not RAM-constrained and want max quality
- ๐ IQ4_XS โ when 4-bit is too big
- ๐ฃ IQ4_NL โ running on ARM (Pi, phones)
---
๐ฌ Prompt format
<bos><start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
---
๐ฅ Download
Single file (recommended):
hf download Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF \
--include "gemma-4-31b-it-Q4_K_M.gguf" --local-dir .
Whole repo:
hf download Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF --local-dir ./gemma-4-31b-it-gguf
---
โถ Run
./llama-server \
-m gemma-4-31b-it-Q4_K_M.gguf \
-c 32768 -ngl 99 \
--host 0.0.0.0 --port 8080
Then point your favorite chat client at http://localhost:8080/v1.
---
๐ฌ Calibration
Imatrix generated from reapmix โ community calibration mix (~400K tokens, multilingual + code + math). Same class of public calibration data used by other community publishers; this release makes no unique calibration claim.
The output tensor and embeddings carry disproportionate weight in quantized output. The *_L variants keep these at Q8_0 โ small disk cost for noticeably better stability at low bit-rates.
---
โ Build info
- Source:
google/gemma-4-31B-it - llama.cpp: latest mainline
- Quantization: CPU
- Imatrix calibration: GPU
- Generated: 2026-06-06
---
๐ Credits
Run Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF with guIDE
Download guIDE โ the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face ยท Compare models