What license applies to TheStageAI/gemma-4-E2B-it-qat-GGUF?

License: mit. Verify terms on Hugging Face before commercial use.

How do I run TheStageAI/gemma-4-E2B-it-qat-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Model Intelligence Sheet

TheStageAI/gemma-4-E2B-it-qat-GGUF overview

TheStageAI/gemma 4 E2B it qat GGUF A portable GGUF release of Google's Gemma 4 E2B instruction model , compressed from Google's QAT trained BF16 weights and em…

llama.cppggufgemmagemma-4qatquantizationtext-generationenmultilingualbase_model:google/gemma-4-E2B-it-qat-q4_0-unquantizedbase_model:quantized:google/gemma-4-E2B-it-qat-q4_0-unquantizedlicense:mitendpoints_compatibleregion:us

Runs locally from ~2.32 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

text-generation

Author

TheStageAI

Repository Files & Downloads

3 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
gemma-4-E2B-it-qat-GGUF-L.gguf	GGUF	GGUF	2.53 GB	Download
gemma-4-E2B-it-qat-GGUF-M.gguf	GGUF	GGUF	2.32 GB	Download
gemma-4-E2B-it-qat-GGUF-W4-uniform.gguf	GGUF	GGUF	2.50 GB	Download

Model Details

Model ID	TheStageAI/gemma-4-E2B-it-qat-GGUF
Author	TheStageAI
Pipeline	text-generation
License	mit
Base model	google/gemma-4-E2B-it-qat-q4_0-unquantized
Last modified	2026-06-11T11:22:08.000Z

Model README

---

license: mit

base_model:

- google/gemma-4-E2B-it-qat-q4_0-unquantized

base_model_relation: quantized

library_name: llama.cpp

pipeline_tag: text-generation

tags:

- gguf

- llama.cpp

- gemma

- gemma-4

- qat

- quantization

language:

- en

- multilingual

---

TheStageAI/gemma-4-E2B-it-qat-GGUF

A portable GGUF release of Google's Gemma 4 E2B instruction model, compressed from Google's

QAT-trained BF16 weights and emitted as standard llama.cpp-compatible .gguf files.

Run it with: llama.cpp or other GGUF-compatible runtimes
Compression source: google/gemma-4-E2B-it-qat-q4_0-unquantized
BF16 reference: google/gemma-4-E2B-it
Smaller native release: TheStageAI/gemma-4-E2B-it-qat

Use this repo when deployment portability matters most. If you can run our native MLX runtime and

want the smallest artifacts, use the edge-lm sibling release.

Why this exists

The native edge-lm checkpoints use custom codecs for both decoder weights and PLE tables, which is

why they are smaller at comparable quality. Many deployments, however, need standard GGUF files that

work with llama.cpp-compatible tooling.

This repo keeps the production bit-width schedules from our native compression pipeline, but maps the

weights into GGUF-compatible quantization formats. The result is larger than the native release, but

portable.

How it was compressed

We start from Google's QAT-trained BF16 checkpoint and reuse the production m and l schedules from

the native release.

Transformer blocks - the M and L files follow our RCO-selected production bit-width

schedules, then emit the weights in GGUF-compatible K-quant layouts with the required group sizes

and symmetric/asymmetric modes for each tensor family.

PLE tables - stored with GGUF-compatible Q4 scalar quantization instead of the native AQLM PLE

codec, so the files stay portable across GGUF runtimes.

Token embeddings / LM head - quantized through the same GGUF-compatible path as the rest of the

model.

W4-uniform - a conservative uniform 4-bit GGUF variant with the same Q4 PLE path.

Operating points

|---|---|---:|---:|---|---|

Usage

Use a recent upstream llama.cpp build. Example:

llama-completion \
  -m gemma-4-E2B-it-qat-GGUF-L.gguf \
  -p "Explain gravity in one sentence." \
  -n 64

Benchmarks

For quality evaluation, GGUF checkpoints are converted through the same dequantized BF16 evaluation

path used for the native release, so the backend is equalized. IFEval p/i means prompt strict /

instruction strict, using the corrected public recipe with max_gen_toks=1280.

|---|---:|---:|---:|---:|

| BF16 reference | 10.21 GB | 1.0x | 61.85 | 75.23 / 82.37 |

| GGUF M | 2.47 GB | 4.1x | 53.79 | 72.64 / 81.29 |

| GGUF L | 2.68 GB | 3.8x | 57.12 | 73.38 / 81.65 |

| GGUF W4-uniform | 2.69 GB | 3.8x | 56.91 | 74.68 / 82.61 |

MMLU-Pro is the official checkpoint-wise vLLM route with Gemma chat formatting and thinking enabled.

The .gguf files in this repo also passed generation smoke tests with upstream llama.cpp.

Files

| File | Contents |

|---|---|

| gemma-4-E2B-it-qat-GGUF-M.gguf | Compact GGUF target |

| gemma-4-E2B-it-qat-GGUF-L.gguf | Higher-quality GGUF target |

| gemma-4-E2B-it-qat-GGUF-W4-uniform.gguf | Uniform W4 GGUF baseline |

License

Released under the MIT License.

As a derivative of Gemma, the weights are also subject to the

Gemma Terms of Use.

Citation

If you use these checkpoints, please cite the Gemma 4 release and the methods we build on

(GPTQ, QEP, AQLM, RCO) - see the references in the

edge-lm write-up.

Run TheStageAI/gemma-4-E2B-it-qat-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models