mannix-ita/gemma-4-a4b-98e-v3-it-gguf Q3_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
mannix-ita/gemma-4-a4b-98e-v3-it-gguf overview
GGUF quantizations of ManniX-ITA/gemma-4-A4B-98e-v3-it — Gemma 4 26B pruned to 98 experts per layer (from 128). Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity). All quants made using imatrix with calibration data v5. Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.
Downloads
10,673
Likes
4
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
29 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| gemma-4-A4B-98e-v3-it-CD-Q2_K.gguf | GGUF | Q2_K | 7.97 GB | Download |
| gemma-4-A4B-98e-v3-it-CD-Q3_K_M.gguf | GGUF | Q3_K_M | 9.48 GB | Download |
| gemma-4-A4B-98e-v3-it-CD-Q4_K_M.gguf | GGUF | Q4_K_M | 10.12 GB | Download |
| gemma-4-A4B-98e-v3-it-CD-Q5_K_M.gguf | GGUF | Q5_K_M | 12.32 GB | Download |
| gemma-4-A4B-98e-v3-it-CD-Q6_K.gguf | GGUF | Q6_K | 14.39 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ2_M.gguf | GGUF | IQ2_M | 7.66 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ2_S.gguf | GGUF | IQ2_S | 7.29 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ2_XS.gguf | GGUF | IQ2_XS | 7.24 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ2_XXS.gguf | GGUF | IQ2_XXS | 6.86 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ3_M.gguf | GGUF | IQ3_M | 9.15 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ3_XS.gguf | GGUF | IQ3_XS | 8.58 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ3_XXS.gguf | GGUF | IQ3_XXS | 8.33 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ4_NL.gguf | GGUF | IQ4_NL | 10.63 GB | Download |
| gemma-4-A4B-98e-v3-it-IQ4_XS.gguf | GGUF | IQ4_XS | 10.25 GB | Download |
| gemma-4-A4B-98e-v3-it-Q3_K_L.gguf | GGUF | Q3_K_L | 10.19 GB | Download |
| gemma-4-A4B-98e-v3-it-Q3_K_M.gguf | GGUF | Q3_K_M | 9.79 GB | Download |
| gemma-4-A4B-98e-v3-it-Q3_K_S.gguf | GGUF | Q3_K_S | 9.01 GB | Download |
| gemma-4-A4B-98e-v3-it-Q3_K_XL.gguf | GGUF | Q3_K_XL | 9.96 GB | Download |
| gemma-4-A4B-98e-v3-it-Q4_0.gguf | GGUF | — | 10.63 GB | Download |
| gemma-4-A4B-98e-v3-it-Q4_1.gguf | GGUF | — | 11.75 GB | Download |
| gemma-4-A4B-98e-v3-it-Q4_K_L.gguf | GGUF | Q4_K_L | 12.50 GB | Download |
| gemma-4-A4B-98e-v3-it-Q4_K_M.gguf | GGUF | Q4_K_M | 12.33 GB | Download |
| gemma-4-A4B-98e-v3-it-Q4_K_S.gguf | GGUF | Q4_K_S | 11.37 GB | Download |
| gemma-4-A4B-98e-v3-it-Q5_K_L.gguf | GGUF | Q5_K_L | 14.20 GB | Download |
| gemma-4-A4B-98e-v3-it-Q5_K_M.gguf | GGUF | Q5_K_M | 14.04 GB | Download |
| gemma-4-A4B-98e-v3-it-Q5_K_S.gguf | GGUF | Q5_K_S | 13.21 GB | Download |
| gemma-4-A4B-98e-v3-it-Q6_K.gguf | GGUF | Q6_K | 16.58 GB | Download |
| gemma-4-A4B-98e-v3-it-Q6_K_L.gguf | GGUF | Q6_K_L | 16.75 GB | Download |
| gemma-4-A4B-98e-v3-it-Q8_0.gguf | GGUF | — | 19.71 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"base_model": "ManniX-ITA/gemma-4-A4B-98e-v3-it",
"tags": [
"gguf",
"imatrix",
"quantized",
"gemma4",
"moe",
"expert-pruning"
],
"license": "other",
"license_name": "gemma",
"frontmatter": {
"base_model": "ManniX-ITA/gemma-4-A4B-98e-v3-it",
"tags": [
"gguf",
"imatrix",
"quantized",
"gemma4",
"moe",
"expert-pruning"
],
"license": "other",
"license_name": "gemma"
},
"hero_image_url": "",
"summary": "GGUF quantizations of ManniX-ITA/gemma-4-A4B-98e-v3-it — Gemma 4 26B pruned to 98 experts per layer (from 128). **Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity).** All quants made using imatrix with calibration data v5. Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nbase_model: ManniX-ITA/gemma-4-A4B-98e-v3-it\ntags:\n - gguf\n - imatrix\n - quantized\n - gemma4\n - moe\n - expert-pruning\nlicense: other\nlicense_name: gemma\n---\n\n# gemma-4-A4B-98e-v3-it-GGUF\n\nGGUF quantizations of [ManniX-ITA/gemma-4-A4B-98e-v3-it](https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it) — Gemma 4 26B pruned to 98 experts per layer (from 128).\n\n**Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity).**\n\nAll quants made using imatrix with [calibration data v5](https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d). Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.\n\n## GPQA Diamond (198 questions, Q6_K, full CoT reasoning)\n\n| Model | Experts/Layer | GPQA Diamond (flex) | Delta |\n|---|---|---|---|\n| Gemma 4 26B-A4B-it (original) | 128 | 75.25% | baseline |\n| [109e v3](https://huggingface.co/ManniX-ITA/gemma-4-A4B-109e-v3-it-GGUF) | 109 | 71.72% | -3.53 pp |\n| **98e v3 (this model)** | **98** | **75.25%** | **+0.00 pp** |\n\n## Available Quantizations\n\n### Standard Quants\n\n| Quantization | File | Size |\n|---|---|---|\n| Q8_0 | gemma-4-A4B-98e-v3-it-Q8_0.gguf | 19.71 GB |\n| Q6_K_L | gemma-4-A4B-98e-v3-it-Q6_K_L.gguf | 16.75 GB |\n| Q6_K | gemma-4-A4B-98e-v3-it-Q6_K.gguf | 16.58 GB |\n| Q5_K_L | gemma-4-A4B-98e-v3-it-Q5_K_L.gguf | 14.20 GB |\n| Q5_K_M | gemma-4-A4B-98e-v3-it-Q5_K_M.gguf | 14.04 GB |\n| Q5_K_S | gemma-4-A4B-98e-v3-it-Q5_K_S.gguf | 13.21 GB |\n| Q4_K_L | gemma-4-A4B-98e-v3-it-Q4_K_L.gguf | 12.50 GB |\n| Q4_K_M | gemma-4-A4B-98e-v3-it-Q4_K_M.gguf | 12.33 GB |\n| Q4_1 | gemma-4-A4B-98e-v3-it-Q4_1.gguf | 11.75 GB |\n| Q4_K_S | gemma-4-A4B-98e-v3-it-Q4_K_S.gguf | 11.37 GB |\n| Q4_0 | gemma-4-A4B-98e-v3-it-Q4_0.gguf | 10.63 GB |\n| IQ4_NL | gemma-4-A4B-98e-v3-it-IQ4_NL.gguf | 10.63 GB |\n| IQ4_XS | gemma-4-A4B-98e-v3-it-IQ4_XS.gguf | 10.25 GB |\n| Q3_K_XL | gemma-4-A4B-98e-v3-it-Q3_K_XL.gguf | 9.96 GB |\n| Q3_K_L | gemma-4-A4B-98e-v3-it-Q3_K_L.gguf | 10.19 GB |\n| Q3_K_M | gemma-4-A4B-98e-v3-it-Q3_K_M.gguf | 9.79 GB |\n| IQ3_M | gemma-4-A4B-98e-v3-it-IQ3_M.gguf | 9.15 GB |\n| Q3_K_S | gemma-4-A4B-98e-v3-it-Q3_K_S.gguf | 9.01 GB |\n| IQ3_XS | gemma-4-A4B-98e-v3-it-IQ3_XS.gguf | 8.58 GB |\n| IQ3_XXS | gemma-4-A4B-98e-v3-it-IQ3_XXS.gguf | 8.33 GB |\n| IQ2_M | gemma-4-A4B-98e-v3-it-IQ2_M.gguf | 7.66 GB |\n| IQ2_S | gemma-4-A4B-98e-v3-it-IQ2_S.gguf | 7.29 GB |\n| IQ2_XS | gemma-4-A4B-98e-v3-it-IQ2_XS.gguf | 7.24 GB |\n| IQ2_XXS | gemma-4-A4B-98e-v3-it-IQ2_XXS.gguf | 6.86 GB |\n\n### ContribDynamic (CD) Quants\n\nPer-layer dynamic quantization based on expert contribution analysis. Important layers get higher precision.\n\n| Quantization | File | Size |\n|---|---|---|\n| CD-Q6_K | gemma-4-A4B-98e-v3-it-CD-Q6_K.gguf | 14.39 GB |\n| CD-Q5_K_M | gemma-4-A4B-98e-v3-it-CD-Q5_K_M.gguf | 12.32 GB |\n| CD-Q4_K_M | gemma-4-A4B-98e-v3-it-CD-Q4_K_M.gguf | 10.12 GB |\n| CD-Q3_K_M | gemma-4-A4B-98e-v3-it-CD-Q3_K_M.gguf | 9.48 GB |\n| CD-Q2_K | gemma-4-A4B-98e-v3-it-CD-Q2_K.gguf | 7.97 GB |\n\n### Skipped Quantizations (failed sanity check)\n\nThe following quantizations were attempted but **failed the sanity check** (3 capital city questions answered incorrectly or incoherently). These quants are intentionally not published:\n\n- Q2_K_L, Q2_K\n\n## Recommended Usage\n\n```bash\nllama-server -m gemma-4-A4B-98e-v3-it-Q4_K_M.gguf -c 32768 -ngl 99 --no-warmup \\\n --reasoning-format deepseek --reasoning-budget 8192\n```\n\n## Method\n\nExpert dropping via per-layer contribution analysis using teacher-force importance mapping (fp32 accumulation). Script: `expert_drop.py` in the [model repo](https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it).\n\n## License\n\nGemma license, inherited from the base model.\n",
"related_quantizations": []
},
"tags": [
"gguf",
"imatrix",
"quantized",
"gemma4",
"moe",
"expert-pruning",
"base_model:ManniX-ITA/gemma-4-A4B-98e-v3-it",
"base_model:quantized:ManniX-ITA/gemma-4-A4B-98e-v3-it",
"license:other",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 4,
"downloads": 10673,
"gated": false,
"private": false,
"last_modified": "2026-04-13T06:03:07.000Z",
"created_at": "2026-04-12T21:54:37.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69dc149de7d734b4eee3aadf",
"id": "ManniX-ITA/gemma-4-A4B-98e-v3-it-GGUF",
"modelId": "ManniX-ITA/gemma-4-A4B-98e-v3-it-GGUF",
"sha": "489e467ffc0eaf5a7a2dd3cc1d10e6b211985a05",
"createdAt": "2026-04-12T21:54:37.000Z",
"lastModified": "2026-04-13T06:03:07.000Z",
"author": "ManniX-ITA",
"downloads": 10673,
"likes": 4,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 32
}