What is Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF?

--- license: apache-2.0 base_model: google/gemma-4-31B-it pipeline_tag: text-generation library_name: gguf tags: - gguf - imatrix - llama.cpp - quantized - verified quantized_by: Krasnopjorovs --- # 🐋 gemma-4-31b-it — Imatrix GGUF > **Verified quants only.** Sub-4-bit variants are excluded from this release because they produce degenerate output on this model size — no point shipping broken files. GGUF imatrix builds of [`google/gemma-4-31B-it`](https://huggingface.co/google/gemma-4-31B-it). Built with [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and importance-matrix calibration on a public multilingual + code + math corpus. Every quant is loaded, prompted, and visually checked before publication. --- ## 👋 From the author Hi local LLM enthusiasts! You may have noticed a few quants from me over the past months — those were one-off experiments. This release is the first one fr…

What license applies to Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

How do I run Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Model Intelligence Sheet

Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF overview

🐋 gemma 4 31b it — Imatrix GGUF Verified quants only. Sub 4 bit variants are excluded from this release because they produce degenerate output on this model s…

ggufimatrixllama.cppquantizedverifiedtext-generationbase_model:google/gemma-4-31B-itbase_model:finetune:google/gemma-4-31B-itlicense:apache-2.0region:us

Downloads

Likes

Pipeline

text-generation

Author

Krasnopjorovs

Repository Files & Downloads

0 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
Browse files on Hugging Face

Model Details

Model ID	Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF
Author	Krasnopjorovs
Pipeline	text-generation
License	apache-2.0
Base model	google/gemma-4-31B-it
Last modified	2026-06-06T20:01:47.000Z

Model README

---

license: apache-2.0

base_model: google/gemma-4-31B-it

pipeline_tag: text-generation

library_name: gguf

tags:

- gguf

- imatrix

- llama.cpp

- quantized

- verified

quantized_by: Krasnopjorovs

---

🐋 gemma-4-31b-it — Imatrix GGUF

> Verified quants only. Sub-4-bit variants are excluded from this release because they produce degenerate output on this model size — no point shipping broken files.

GGUF imatrix builds of google/gemma-4-31B-it.

Built with llama.cpp and importance-matrix calibration on a public multilingual + code + math corpus. Every quant is loaded, prompted, and visually checked before publication.

---

👋 From the author

Hi local LLM enthusiasts! You may have noticed a few quants from me over the past months — those were one-off experiments. This release is the first one from a fully automated pipeline.

Please try them out, share feedback, and let me know which quants you find useful or which ones you wish I had made. For now, anything below 4-bit produces degraded output on this architecture, so I am holding off on those until they actually work well. More quants and more models coming.

---

🎯 Pick a quant

| Quant | Size | What it's for |

|---|---|---|

| Q8_0 | 31G | Almost the original. Pick this if RAM isn't a concern. |

| Q6_K_L | 24G | Near-lossless with Q8_0 embeddings. Best of the K-quants. |

| Q6_K | 24G | Near-lossless. Excellent fidelity at smaller size than Q8_0. |

| Q5_K_L | 21G | Q5_K_M with Q8_0 embeddings. High quality, small overhead. |

| Q5_K_M | 21G | Sweet spot between Q4 and Q6. Solid all-rounder. |

| Q5_K_S | 20G | Slightly smaller than Q5_K_M, virtually identical output. |

| Q4_K_L | 18G | Q4_K_M with Q8_0 embeddings. The smart 4-bit choice. |

| Q4_K_M | 18G | The default. Most-downloaded quant for a reason. |

| Q4_K_S | 17G | A bit smaller than Q4_K_M, a tiny step down. |

| IQ4_NL | 17G | ARM-optimized 4-bit. For Raspberry Pi & friends. |

| IQ4_XS | 16G | Tightest 4-bit format. Quality close to Q4_K_S, smaller. |

*_L variants override the output tensor and embeddings to Q8_0 — small disk cost, better output stability.

Tl;dr:

🟢 Q4_K_M — start here unless you have a reason not to
🟡 Q5_K_M — small quality bump if you have RAM to spare
🔵 Q8_0 — pick this if you're not RAM-constrained and want max quality
🟠 IQ4_XS — when 4-bit is too big
🟣 IQ4_NL — running on ARM (Pi, phones)

---

💬 Prompt format

<bos><start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model

---

📥 Download

Single file (recommended):

hf download Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF \
  --include "gemma-4-31b-it-Q4_K_M.gguf" --local-dir .

Whole repo:

hf download Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF --local-dir ./gemma-4-31b-it-gguf

---

▶ Run

./llama-server \
  -m gemma-4-31b-it-Q4_K_M.gguf \
  -c 32768 -ngl 99 \
  --host 0.0.0.0 --port 8080

Then point your favorite chat client at http://localhost:8080/v1.

---

🔬 Calibration

Imatrix generated from reapmix — community calibration mix (~400K tokens, multilingual + code + math). Same class of public calibration data used by other community publishers; this release makes no unique calibration claim.

The output tensor and embeddings carry disproportionate weight in quantized output. The *_L variants keep these at Q8_0 — small disk cost for noticeably better stability at low bit-rates.

---

✅ Build info

Source: google/gemma-4-31B-it
llama.cpp: latest mainline
Quantization: CPU
Imatrix calibration: GPU
Generated: 2026-06-06

---

🙏 Credits

Source model — Google (Apache 2.0)
Calibration corpus — reapmix by eaddario
Quantization tooling — llama.cpp

Run Krasnopjorovs/gemma-4-31b-it-Imatrix-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models