ayodele01/gemma-4-21b-a4b-it-reap-gguf Q5_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
ayodele01/gemma-4-21b-a4b-it-reap-gguf overview
GGUF quantized versions of 0xSero/gemma-4-21b-a4b-it-REAP.
Downloads
1,502
Likes
0
Pipeline
text-generation
Library
gguf
Visibility
Public
Access
Open
Repository Files & Downloads
5 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| gemma-4-21b-a4b-it-REAP-Q3_K_M.gguf | GGUF | Q3_K_M | 10.22 GB | Download |
| gemma-4-21b-a4b-it-REAP-Q4_K_M.gguf | GGUF | Q4_K_M | 12.88 GB | Download |
| gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf | GGUF | Q5_K_M | 14.67 GB | Download |
| gemma-4-21b-a4b-it-REAP-Q8_0.gguf | GGUF | — | 20.59 GB | Download |
| gemma-4-21b-a4b-it-REAP.gguf | GGUF | — | 38.72 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "gemma",
"library_name": "gguf",
"base_model": "0xSero/gemma-4-21b-a4b-it-REAP",
"tags": [
"gemma",
"gemma-4",
"moe",
"pruning",
"reap",
"gguf",
"llama-cpp"
],
"language": [
"en"
],
"pipeline_tag": "text-generation",
"frontmatter": {
"license": "gemma",
"library_name": "gguf",
"base_model": "0xSero/gemma-4-21b-a4b-it-REAP",
"tags": [
"gemma",
"gemma-4",
"moe",
"pruning",
"reap",
"gguf",
"llama-cpp"
],
"language": [
"en"
],
"pipeline_tag": "text-generation"
},
"hero_image_url": "",
"summary": "GGUF quantized versions of 0xSero/gemma-4-21b-a4b-it-REAP.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: gemma\nlibrary_name: gguf\nbase_model: 0xSero/gemma-4-21b-a4b-it-REAP\ntags:\n - gemma\n - gemma-4\n - moe\n - pruning\n - reap\n - gguf\n - llama-cpp\nlanguage:\n - en\npipeline_tag: text-generation\n---\n\n# Gemma-4 21B-A4B-it REAP - GGUF\n\nGGUF quantized versions of [0xSero/gemma-4-21b-a4b-it-REAP](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP).\n\n## Model Description\n\nThis is **20% expert-pruned** version of Google's Gemma-4 26B-A4B-it using **[Cerebras REAP](https://github.com/cerebras/reap)** (Router-weighted Expert Activation Pruning).\n\n### Key Specifications\n\n| Metric | Original (26B) | This Model (21B) |\n|--------|----------------|------------------|\n| Total params | ~26B | **21.34B** |\n| Experts/layer | 128 | **103** |\n| Active params/token | ~4B | ~4B |\n| Disk size | ~52GB | **~43GB** |\n\nREAP removes 20% of MoE experts (25 of 128 per layer) while preserving the model's routing behavior. The active parameter count per token is unchanged since the router still selects 8 experts per token from the remaining pool.\n\n### Architecture\n- **30 transformer layers**\n- **Sliding attention** (window=1024) for 25 layers, **full attention** every 6th layer\n- **MoE FFN** with 103 experts per layer, 8 active per token\n- **Thinking model** -- uses `<|channel>thought` / `<|channel>response` channels\n- **Multimodal** -- supports text and vision inputs\n- **Context window:** 262,144 tokens\n\n## Available Quantizations\n\n| Filename | Quant Type | Size | Description |\n|----------|------------|------|-------------|\n| `gemma-4-21b-a4b-it-REAP.gguf` | BF16 | ~43GB | Full precision, best quality |\n| `gemma-4-21b-a4b-it-REAP-Q8_0.gguf` | Q8_0 | ~23GB | High quality |\n| `gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf` | Q5_K_M | ~15GB | Balanced (recommended) |\n| `gemma-4-21b-a4b-it-REAP-Q4_K_M.gguf` | Q4_K_M | ~13GB | Good quality, smaller |\n| `gemma-4-21b-a4b-it-REAP-Q3_K_M.gguf` | Q3_K_M | ~10GB | Smallest |\n\n## Usage with llama.cpp\n\n```bash\n# Download a quantized model\nwget https://huggingface.co/Ayodele01/gemma-4-21b-a4b-it-REAP-GGUF/resolve/main/gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf\n\n# Run with llama.cpp\n./llama-cli -m gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf \\\n -p \"Write a quicksort in Python.\" \\\n -n 2048\n```\n\n## Usage with Ollama\n\nCreate a Modelfile:\n```\nFROM ./gemma-4-21b-a4b-it-REAP-Q5_K_M.gguf\n\nTEMPLATE \"\"\"<bos><start_of_turn>user\n{{ .Prompt }}<end_of_turn>\n<start_of_turn>model\n\"\"\"\n\nPARAMETER stop \"<end_of_turn>\"\nPARAMETER temperature 0.7\n```\n\nThen:\n```bash\nollama create gemma4-21b-reap -f Modelfile\nollama run gemma4-21b-reap\n```\n\n## Benchmark Results (from original REAP model)\n\n| Task | Original (26B) | REAP 21B |\n|------|----------------|----------|\n| Elementary Math | 92% | 90% |\n| Philosophy | 92% | 88% |\n| GSM8K | 86% | 84% |\n\nGeneration quality is \"essentially indistinguishable from the original\" according to the REAP authors.\n\n## License\n\nThis model is released under the [Gemma License](https://ai.google.dev/gemma/terms).\n\n## Credits\n\n- **Original model:** [google/gemma-4-26b-a4b-it](https://huggingface.co/google/gemma-4-26b-a4b-it)\n- **REAP pruning:** [0xSero/gemma-4-21b-a4b-it-REAP](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP)\n- **REAP paper:** [arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)\n- **GGUF conversion:** Ayodele01\n",
"related_quantizations": []
},
"tags": [
"gguf",
"gemma",
"gemma-4",
"moe",
"pruning",
"reap",
"llama-cpp",
"text-generation",
"en",
"arxiv:2510.13999",
"base_model:0xSero/gemma-4-21b-a4b-it-REAP",
"base_model:quantized:0xSero/gemma-4-21b-a4b-it-REAP",
"license:gemma",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 1502,
"gated": false,
"private": false,
"last_modified": "2026-04-05T19:55:40.000Z",
"created_at": "2026-04-05T19:36:03.000Z",
"pipeline_tag": "text-generation",
"library_name": "gguf"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69d2b9a3bbf83b1a4bb86ec2",
"id": "Ayodele01/gemma-4-21b-a4b-it-REAP-GGUF",
"modelId": "Ayodele01/gemma-4-21b-a4b-it-REAP-GGUF",
"sha": "307bdc5f39e06b6fc40bae168d2082326bb93fd9",
"createdAt": "2026-04-05T19:36:03.000Z",
"lastModified": "2026-04-05T19:55:40.000Z",
"author": "Ayodele01",
"downloads": 1502,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "gguf",
"siblings_count": 7
}