GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

AtomicChat/gemma-4-12b-it-GGUF overview

<center <div style="display:flex; justify content:center; align items:center; gap:10px; flex wrap:wrap;" <a href="https://atomic.chat" <img src="https://huggin…

ggufatomic-chatgemmagemma4imatrixquantizedllama.cppimage-text-to-textenruesfrdezhjaptitkoarhilicense:gemmaendpoints_compatibleregion:usconversational

Runs locally from ~167.0 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
image-text-to-text

Repository Files & Downloads

13 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
atomic-chat-gemma412-IQ3_M.ggufGGUFIQ3_M5.34 GBDownload
atomic-chat-gemma412-IQ4_XS.ggufGGUFIQ4_XS6.18 GBDownload
atomic-chat-gemma412-Q2_K.ggufGGUFQ2_K4.50 GBDownload
atomic-chat-gemma412-Q3_K_L.ggufGGUFQ3_K_L6.12 GBDownload
atomic-chat-gemma412-Q3_K_M.ggufGGUFQ3_K_M5.67 GBDownload
atomic-chat-gemma412-Q4_K_M.ggufGGUFQ4_K_M6.87 GBDownload
atomic-chat-gemma412-Q4_K_S.ggufGGUFQ4_K_S6.54 GBDownload
atomic-chat-gemma412-Q5_K_M.ggufGGUFQ5_K_M7.96 GBDownload
atomic-chat-gemma412-Q5_K_S.ggufGGUFQ5_K_S7.07 GBDownload
atomic-chat-gemma412-Q6_K.ggufGGUFQ6_K9.11 GBDownload
atomic-chat-gemma412-Q8_0.ggufGGUFQ8_011.80 GBDownload
atomic-chat-gemma412-UD-Q4_K_XL.ggufGGUFQ4_K_XL7.10 GBDownload
mmproj-gemma4-12b-f16.ggufGGUFF16167.0 MBDownload

Model Details

Model IDAtomicChat/gemma-4-12b-it-GGUF
AuthorAtomicChat
Pipelineimage-text-to-text
Licensegemma
Base modelgoogle/gemma-4-12b-it
Last modified2026-06-10T17:40:58.000Z

Model README

---

license: gemma

license_link: https://ai.google.dev/gemma/terms

thumbnail: https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/gemma_banner.png

base_model:

  • google/gemma-4-12b-it

base_model_relation: quantized

quantized_by: AlexAtomic

language:

  • en
  • ru
  • es
  • fr
  • de
  • zh
  • ja
  • pt
  • it
  • ko
  • ar
  • hi

pipeline_tag: image-text-to-text

library_name: gguf

tags:

  • atomic-chat
  • gemma
  • gemma4
  • gguf
  • imatrix
  • quantized
  • llama.cpp

---

<center>

<div style="display:flex; justify-content:center; align-items:center; gap:10px; flex-wrap:wrap;">

<a href="https://atomic.chat"><img src="https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/pill_atomic_v3.png" alt="Atomic Chat" width="186"></a>

<a href="https://discord.gg/8wGSsvmg4V"><img src="https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/pill_discord_v3.png" alt="Join Discord" width="184"></a>

<a href="https://github.com/AtomicBot-ai/Atomic-Chat"><img src="https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/pill_github_v3.png" alt="GitHub" width="141"></a>

</div>

<br/>

<img src="https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/gemma_banner.png" alt="Gemma 4" style="width:100%; max-width:100%; height:auto; margin-bottom:0.6em;"/>

<div style="display:flex; justify-content:center; gap:0.5em;">

<a href="https://huggingface.co/google/gemma-4-12b-it"><strong>Base model: google/gemma-4-12b-it</strong></a>

</div>

</center>

Gemma 4 12B, self-quantized to GGUF by Atomic Chat. Built straight from Google's original bf16 weights with a per-tensor importance matrix, so every file stays close to full precision. Runs fully offline.

Highlights

Gemma 4 12B is Google DeepMind's encoder-free model that projects raw inputs straight into the LLM embedding space. It punches well above its size on reasoning, code and long context while staying small enough for a laptop.

  • Reasoning and code at a level usually reserved for much larger models.
  • 256K context for long documents and codebases.
  • Full quant ladder from Q2_K to Q8_0, plus a dynamic UD-Q4_K_XL.
  • Importance matrix on every quant, computed over the standard calibration_datav3 corpus, so low-bit files lose far less quality.
  • Open weights, fully offline through Atomic Chat, llama.cpp, Ollama, LM Studio or Jan.

> [!NOTE]

> These GGUFs are self-quantized from Google's original bf16 weights, not a repack. The importance matrix keeps low-bit quants closer to the full-precision model.

> [!IMPORTANT]

> Always pass --jinja so the Gemma 4 chat template is applied. Without it the model can emit malformed turns.

Model Overview

| Property | Value |

|---|---|

| Base model | google/gemma-4-12b-it |

| Total parameters | 11.95B |

| Layers | 48 |

| Context length | 256K (262,144) |

| Vocabulary | 262K |

| Architecture | gemma4 |

| This repo | GGUF quants (imatrix) + vision/audio mmproj |

> [!NOTE]

> Gemma 4 is natively multimodal (text, image, audio). This repo ships the mmproj-gemma4-12b-f16.gguf projector for vision and audio. With -hf the projector is pulled automatically; otherwise pass it via --mmproj. Use llama-mtmd-cli or llama-server to feed images and audio.

<img src="https://huggingface.co/AtomicChat/gemma-4-12b-it-GGUF/resolve/main/benchmark.png" alt="Gemma 4 12B benchmark scores" style="width:100%; max-width:760px;"/>

Scores are Google's published results for the base gemma-4-12b-it. Quantization preserves the large majority of this; Q4_K_M and up sit within a point or two of full precision.

Choosing a quant

| Quant | Size | Notes |

|---|---|---|

| Q2_K | 4.5 GB | Smallest. Minimal RAM, clear quality drop. |

| IQ3_M | 5.4 GB | Beats Q3 at similar size thanks to imatrix. Best low-RAM pick. |

| Q3_K_M | 5.7 GB | Low quality but usable. |

| Q3_K_L | 6.2 GB | A step above Q3_K_M. |

| IQ4_XS | 6.2 GB | Excellent quality for size. Recommended low-bit. |

| Q4_K_S | 6.6 GB | Compact Q4, fast. |

| Q4_K_M | 6.9 GB | Recommended default. Best balance of size, speed and quality. |

| UD-Q4_K_XL | 7.2 GB | Dynamic. Embeddings and output kept at Q8_0 for higher quality at a Q4 footprint. |

| Q5_K_S | 7.1 GB | Higher quality. |

| Q5_K_M | 8.0 GB | Higher quality, low loss. |

| Q6_K | 9.2 GB | Near lossless. |

| Q8_0 | 12.0 GB | Effectively lossless, reference quality. |

> [!TIP]

> Pick the largest file that fits your (V)RAM with room for context. Q4_K_M or UD-Q4_K_XL is the sweet spot for most setups; Q6_K or Q8_0 for maximum fidelity.

Get started

Run Gemma 4 12B locally with:

  • Atomic Chat: the easiest path. Open the app, search AtomicChat/gemma-4-12b-it-GGUF, pick a quant, hit Use this model.
  • llama.cpp: llama-server -hf AtomicChat/gemma-4-12b-it-GGUF:Q4_K_M --jinja -c 8192 (build steps in the section below).
  • Ollama: ollama run hf.co/AtomicChat/gemma-4-12b-it-GGUF:Q4_K_M
  • LM Studio: search the repo id, download any quant.
  • Jan: search the repo id, download any quant.

Best practices

Gemma 4 works well with its standard sampling defaults:

| Parameter | Value |

|---|---|

| temperature | 1.0 |

| top_k | 64 |

| top_p | 0.95 |

| min_p | 0.0 |

| repeat_penalty | 1.0 |

Drop temperature to 0.6 or 0.7 for code and math where you want determinism.

Run in llama.cpp

Build llama.cpp, then point llama-server straight at this repo:

apt-get update
apt-get install build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggerganov/llama.cpp
cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp
./llama.cpp/llama-server \
    -hf AtomicChat/gemma-4-12b-it-GGUF:UD-Q4_K_XL \
    --jinja -ngl 99 -c 8192 -fa on

Set -DGGML_CUDA=OFF for CPU or Metal builds.

How these were made

  1. Download google/gemma-4-12b-it (bf16).
  2. Convert to f16 GGUF with llama.cpp.
  3. Build an importance matrix over calibration_datav3 (100 chunks).
  4. Quantize the full ladder with --imatrix.
  5. UD-Q4_K_XL additionally pins the token-embedding and output tensors to Q8_0.

License

These weights are derived from Gemma and stay governed by the Gemma Terms of Use. By downloading you agree to those terms. Original model by Google DeepMind. Quantized by Atomic Chat.

Run AtomicChat/gemma-4-12b-it-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models