GraySoft
Projects Models About FAQ Contact Download guIDE →

mannix-ita/gemma-4-a4b-98e-v3-it-gguf Q3_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

mannix-ita/gemma-4-a4b-98e-v3-it-gguf overview

GGUF quantizations of ManniX-ITA/gemma-4-A4B-98e-v3-it — Gemma 4 26B pruned to 98 experts per layer (from 128). Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity). All quants made using imatrix with calibration data v5. Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.

ggufimatrixquantizedgemma4moeexpert-pruningbase_model:ManniX-ITA/gemma-4-A4B-98e-v3-itbase_model:quantized:ManniX-ITA/gemma-4-A4B-98e-v3-itlicense:otherendpoints_compatibleregion:usconversational
mannix-ita/gemma-4-a4b-98e-v3-it-gguf visual
Downloads
10,673
Likes
4
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

29 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
gemma-4-A4B-98e-v3-it-CD-Q2_K.gguf GGUF Q2_K 7.97 GB Download
gemma-4-A4B-98e-v3-it-CD-Q3_K_M.gguf GGUF Q3_K_M 9.48 GB Download
gemma-4-A4B-98e-v3-it-CD-Q4_K_M.gguf GGUF Q4_K_M 10.12 GB Download
gemma-4-A4B-98e-v3-it-CD-Q5_K_M.gguf GGUF Q5_K_M 12.32 GB Download
gemma-4-A4B-98e-v3-it-CD-Q6_K.gguf GGUF Q6_K 14.39 GB Download
gemma-4-A4B-98e-v3-it-IQ2_M.gguf GGUF IQ2_M 7.66 GB Download
gemma-4-A4B-98e-v3-it-IQ2_S.gguf GGUF IQ2_S 7.29 GB Download
gemma-4-A4B-98e-v3-it-IQ2_XS.gguf GGUF IQ2_XS 7.24 GB Download
gemma-4-A4B-98e-v3-it-IQ2_XXS.gguf GGUF IQ2_XXS 6.86 GB Download
gemma-4-A4B-98e-v3-it-IQ3_M.gguf GGUF IQ3_M 9.15 GB Download
gemma-4-A4B-98e-v3-it-IQ3_XS.gguf GGUF IQ3_XS 8.58 GB Download
gemma-4-A4B-98e-v3-it-IQ3_XXS.gguf GGUF IQ3_XXS 8.33 GB Download
gemma-4-A4B-98e-v3-it-IQ4_NL.gguf GGUF IQ4_NL 10.63 GB Download
gemma-4-A4B-98e-v3-it-IQ4_XS.gguf GGUF IQ4_XS 10.25 GB Download
gemma-4-A4B-98e-v3-it-Q3_K_L.gguf GGUF Q3_K_L 10.19 GB Download
gemma-4-A4B-98e-v3-it-Q3_K_M.gguf GGUF Q3_K_M 9.79 GB Download
gemma-4-A4B-98e-v3-it-Q3_K_S.gguf GGUF Q3_K_S 9.01 GB Download
gemma-4-A4B-98e-v3-it-Q3_K_XL.gguf GGUF Q3_K_XL 9.96 GB Download
gemma-4-A4B-98e-v3-it-Q4_0.gguf GGUF 10.63 GB Download
gemma-4-A4B-98e-v3-it-Q4_1.gguf GGUF 11.75 GB Download
gemma-4-A4B-98e-v3-it-Q4_K_L.gguf GGUF Q4_K_L 12.50 GB Download
gemma-4-A4B-98e-v3-it-Q4_K_M.gguf GGUF Q4_K_M 12.33 GB Download
gemma-4-A4B-98e-v3-it-Q4_K_S.gguf GGUF Q4_K_S 11.37 GB Download
gemma-4-A4B-98e-v3-it-Q5_K_L.gguf GGUF Q5_K_L 14.20 GB Download
gemma-4-A4B-98e-v3-it-Q5_K_M.gguf GGUF Q5_K_M 14.04 GB Download
gemma-4-A4B-98e-v3-it-Q5_K_S.gguf GGUF Q5_K_S 13.21 GB Download
gemma-4-A4B-98e-v3-it-Q6_K.gguf GGUF Q6_K 16.58 GB Download
gemma-4-A4B-98e-v3-it-Q6_K_L.gguf GGUF Q6_K_L 16.75 GB Download
gemma-4-A4B-98e-v3-it-Q8_0.gguf GGUF 19.71 GB Download

Model Details Live

Model Slug
mannix-ita/gemma-4-a4b-98e-v3-it-gguf
Author
ManniX-ITA
Pipeline Task
Library
Created
2026-04-12
Last Modified
2026-04-13
Gated
No
Private
No
HF SHA
489e467ffc0eaf5a7a2dd3cc1d10e6b211985a05
License
other
Language
Unknown
Base Model
ManniX-ITA/gemma-4-A4B-98e-v3-it

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "base_model": "ManniX-ITA/gemma-4-A4B-98e-v3-it",
    "tags": [
      "gguf",
      "imatrix",
      "quantized",
      "gemma4",
      "moe",
      "expert-pruning"
    ],
    "license": "other",
    "license_name": "gemma",
    "frontmatter": {
      "base_model": "ManniX-ITA/gemma-4-A4B-98e-v3-it",
      "tags": [
        "gguf",
        "imatrix",
        "quantized",
        "gemma4",
        "moe",
        "expert-pruning"
      ],
      "license": "other",
      "license_name": "gemma"
    },
    "hero_image_url": "",
    "summary": "GGUF quantizations of ManniX-ITA/gemma-4-A4B-98e-v3-it — Gemma 4 26B pruned to 98 experts per layer (from 128). **Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity).** All quants made using imatrix with calibration data v5. Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model: ManniX-ITA/gemma-4-A4B-98e-v3-it\ntags:\n  - gguf\n  - imatrix\n  - quantized\n  - gemma4\n  - moe\n  - expert-pruning\nlicense: other\nlicense_name: gemma\n---\n\n# gemma-4-A4B-98e-v3-it-GGUF\n\nGGUF quantizations of [ManniX-ITA/gemma-4-A4B-98e-v3-it](https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it) — Gemma 4 26B pruned to 98 experts per layer (from 128).\n\n**Zero GPQA degradation despite dropping 30 experts per layer (23.4% of MoE capacity).**\n\nAll quants made using imatrix with [calibration data v5](https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d). Includes ContribDynamic (CD) quants with per-layer dynamic quantization based on expert contribution analysis.\n\n## GPQA Diamond (198 questions, Q6_K, full CoT reasoning)\n\n| Model | Experts/Layer | GPQA Diamond (flex) | Delta |\n|---|---|---|---|\n| Gemma 4 26B-A4B-it (original) | 128 | 75.25% | baseline |\n| [109e v3](https://huggingface.co/ManniX-ITA/gemma-4-A4B-109e-v3-it-GGUF) | 109 | 71.72% | -3.53 pp |\n| **98e v3 (this model)** | **98** | **75.25%** | **+0.00 pp** |\n\n## Available Quantizations\n\n### Standard Quants\n\n| Quantization | File | Size |\n|---|---|---|\n| Q8_0 | gemma-4-A4B-98e-v3-it-Q8_0.gguf | 19.71 GB |\n| Q6_K_L | gemma-4-A4B-98e-v3-it-Q6_K_L.gguf | 16.75 GB |\n| Q6_K | gemma-4-A4B-98e-v3-it-Q6_K.gguf | 16.58 GB |\n| Q5_K_L | gemma-4-A4B-98e-v3-it-Q5_K_L.gguf | 14.20 GB |\n| Q5_K_M | gemma-4-A4B-98e-v3-it-Q5_K_M.gguf | 14.04 GB |\n| Q5_K_S | gemma-4-A4B-98e-v3-it-Q5_K_S.gguf | 13.21 GB |\n| Q4_K_L | gemma-4-A4B-98e-v3-it-Q4_K_L.gguf | 12.50 GB |\n| Q4_K_M | gemma-4-A4B-98e-v3-it-Q4_K_M.gguf | 12.33 GB |\n| Q4_1 | gemma-4-A4B-98e-v3-it-Q4_1.gguf | 11.75 GB |\n| Q4_K_S | gemma-4-A4B-98e-v3-it-Q4_K_S.gguf | 11.37 GB |\n| Q4_0 | gemma-4-A4B-98e-v3-it-Q4_0.gguf | 10.63 GB |\n| IQ4_NL | gemma-4-A4B-98e-v3-it-IQ4_NL.gguf | 10.63 GB |\n| IQ4_XS | gemma-4-A4B-98e-v3-it-IQ4_XS.gguf | 10.25 GB |\n| Q3_K_XL | gemma-4-A4B-98e-v3-it-Q3_K_XL.gguf | 9.96 GB |\n| Q3_K_L | gemma-4-A4B-98e-v3-it-Q3_K_L.gguf | 10.19 GB |\n| Q3_K_M | gemma-4-A4B-98e-v3-it-Q3_K_M.gguf | 9.79 GB |\n| IQ3_M | gemma-4-A4B-98e-v3-it-IQ3_M.gguf | 9.15 GB |\n| Q3_K_S | gemma-4-A4B-98e-v3-it-Q3_K_S.gguf | 9.01 GB |\n| IQ3_XS | gemma-4-A4B-98e-v3-it-IQ3_XS.gguf | 8.58 GB |\n| IQ3_XXS | gemma-4-A4B-98e-v3-it-IQ3_XXS.gguf | 8.33 GB |\n| IQ2_M | gemma-4-A4B-98e-v3-it-IQ2_M.gguf | 7.66 GB |\n| IQ2_S | gemma-4-A4B-98e-v3-it-IQ2_S.gguf | 7.29 GB |\n| IQ2_XS | gemma-4-A4B-98e-v3-it-IQ2_XS.gguf | 7.24 GB |\n| IQ2_XXS | gemma-4-A4B-98e-v3-it-IQ2_XXS.gguf | 6.86 GB |\n\n### ContribDynamic (CD) Quants\n\nPer-layer dynamic quantization based on expert contribution analysis. Important layers get higher precision.\n\n| Quantization | File | Size |\n|---|---|---|\n| CD-Q6_K | gemma-4-A4B-98e-v3-it-CD-Q6_K.gguf | 14.39 GB |\n| CD-Q5_K_M | gemma-4-A4B-98e-v3-it-CD-Q5_K_M.gguf | 12.32 GB |\n| CD-Q4_K_M | gemma-4-A4B-98e-v3-it-CD-Q4_K_M.gguf | 10.12 GB |\n| CD-Q3_K_M | gemma-4-A4B-98e-v3-it-CD-Q3_K_M.gguf | 9.48 GB |\n| CD-Q2_K | gemma-4-A4B-98e-v3-it-CD-Q2_K.gguf | 7.97 GB |\n\n### Skipped Quantizations (failed sanity check)\n\nThe following quantizations were attempted but **failed the sanity check** (3 capital city questions answered incorrectly or incoherently). These quants are intentionally not published:\n\n- Q2_K_L, Q2_K\n\n## Recommended Usage\n\n```bash\nllama-server -m gemma-4-A4B-98e-v3-it-Q4_K_M.gguf -c 32768 -ngl 99 --no-warmup \\\n    --reasoning-format deepseek --reasoning-budget 8192\n```\n\n## Method\n\nExpert dropping via per-layer contribution analysis using teacher-force importance mapping (fp32 accumulation). Script: `expert_drop.py` in the [model repo](https://huggingface.co/ManniX-ITA/gemma-4-A4B-98e-v3-it).\n\n## License\n\nGemma license, inherited from the base model.\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "imatrix",
    "quantized",
    "gemma4",
    "moe",
    "expert-pruning",
    "base_model:ManniX-ITA/gemma-4-A4B-98e-v3-it",
    "base_model:quantized:ManniX-ITA/gemma-4-A4B-98e-v3-it",
    "license:other",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 4,
  "downloads": 10673,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-13T06:03:07.000Z",
  "created_at": "2026-04-12T21:54:37.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69dc149de7d734b4eee3aadf",
  "id": "ManniX-ITA/gemma-4-A4B-98e-v3-it-GGUF",
  "modelId": "ManniX-ITA/gemma-4-A4B-98e-v3-it-GGUF",
  "sha": "489e467ffc0eaf5a7a2dd3cc1d10e6b211985a05",
  "createdAt": "2026-04-12T21:54:37.000Z",
  "lastModified": "2026-04-13T06:03:07.000Z",
  "author": "ManniX-ITA",
  "downloads": 10673,
  "likes": 4,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 32
}