djlougen/nemotron-h-120b-reap-50pct-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

djlougen/nemotron-h-120b-reap-50pct-gguf overview

GGUF quantizations of the REAP 50%-pruned Nemotron-H 120B model for use with llama.cpp and compatible tools.

llama.cppggufnemotron_hreapprunedsparse-moemambaquantizedtext-generationarxiv:2510.13999base_model:0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draftbase_model:quantized:0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draftlicense:otherendpoints_compatibleregion:usconversational

djlougen/nemotron-h-120b-reap-50pct-gguf visual

Downloads

991

Likes

Pipeline

text-generation

Library

llama.cpp

Visibility

Public

Access

Open

Repository Files & Downloads

3 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Nemotron-H-120B-REAP-50pct-BF16.gguf	GGUF	BF16	119.78 GB	Download
Nemotron-H-120B-REAP-50pct-Q4_K_M.gguf	GGUF	Q4_K_M	42.25 GB	Download
Nemotron-H-120B-REAP-50pct-Q8_0.gguf	GGUF	—	63.71 GB	Download

Model Details Live

Model Slug

djlougen/nemotron-h-120b-reap-50pct-gguf

Author

DJLougen

Pipeline Task

text-generation

Library

llama.cpp

Created

2026-03-28

Last Modified

2026-03-28

Gated

Private

HF SHA

a7c5011b4c39c714c6de3904649ee2605e955535

License

other

Language

Unknown

Base Model

0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft, nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "other",
    "license_name": "nvidia-open-model-license",
    "base_model": [
      "0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft",
      "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
    ],
    "library_name": "llama.cpp",
    "pipeline_tag": "text-generation",
    "tags": [
      "nemotron_h",
      "reap",
      "pruned",
      "gguf",
      "sparse-moe",
      "mamba",
      "quantized"
    ],
    "frontmatter": {
      "license": "other",
      "license_name": "nvidia-open-model-license",
      "base_model": [
        "0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft",
        "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
      ],
      "library_name": "llama.cpp",
      "pipeline_tag": "text-generation",
      "tags": [
        "nemotron_h",
        "reap",
        "pruned",
        "gguf",
        "sparse-moe",
        "mamba",
        "quantized"
      ]
    },
    "hero_image_url": "",
    "summary": "GGUF quantizations of the REAP 50%-pruned Nemotron-H 120B model for use with llama.cpp and compatible tools.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlicense_name: nvidia-open-model-license\nbase_model:\n- 0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft\n- nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16\nlibrary_name: llama.cpp\npipeline_tag: text-generation\ntags:\n- nemotron_h\n- reap\n- pruned\n- gguf\n- sparse-moe\n- mamba\n- quantized\n---\n\n> **Credit:** This is a GGUF quantization of [0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft](https://huggingface.co/0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft), a REAP expert-pruned checkpoint derived from [nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16). All credit for the base model goes to NVIDIA, and the REAP pruning work to [0xSero](https://huggingface.co/0xSero).\n\n# NVIDIA Nemotron-H 120B REAP 50% — GGUF\n\nGGUF quantizations of the REAP 50%-pruned Nemotron-H 120B model for use with [llama.cpp](https://github.com/ggml-org/llama.cpp) and compatible tools.\n\n## Available Quantizations\n\n| File | Quant | Size | BPW |\n|------|-------|------|-----|\n| `Nemotron-H-120B-REAP-50pct-BF16.gguf` | BF16 | 128.6 GB | 16.01 |\n| `Nemotron-H-120B-REAP-50pct-Q8_0.gguf` | Q8_0 | 68.4 GB | 8.52 |\n| `Nemotron-H-120B-REAP-50pct-Q6_K.gguf` | Q6_K | 59.7 GB | 7.43 |\n| `Nemotron-H-120B-REAP-50pct-Q4_K_M.gguf` | Q4_K_M | 45.4 GB | 5.65 |\n\n## Model Details\n\n| Property | Value |\n|----------|-------|\n| Architecture | NemotronH (hybrid Mamba + MoE + Attention) |\n| Total Blocks | 88 (40 Mamba, 40 MoE, 8 Attention) |\n| Original Parameters | ~120B (64B after 50% expert pruning) |\n| Experts per MoE Layer | 256 (pruned from 512) |\n| Routed Experts per Token | 22 |\n| Context Length | 262,144 tokens |\n| Vocab Size | 131,072 |\n\n## Usage\n\n```bash\n# With llama.cpp\nllama-cli -m Nemotron-H-120B-REAP-50pct-Q4_K_M.gguf -p \"Hello\" -n 128\n\n# With ollama (create a Modelfile first)\nollama create nemotron-h-reap -f Modelfile\n```\n\n## About REAP Pruning\n\nThis model was pruned using the REAP method ([arXiv:2510.13999](https://arxiv.org/abs/2510.13999)), which selectively removes 50% of MoE experts based on layerwise activation observations. This reduces memory footprint while preserving quality for the most commonly activated expert pathways.\n\n- Source: [0xSero/reap-expert-swap](https://github.com/0xSero/reap-expert-swap)\n- Research funding: [donate.sybilsolutions.ai](https://donate.sybilsolutions.ai)\n\n## Draft Caveats\n\nThis is a **draft** derived checkpoint from the original author. Full serving benchmarks and quality evaluations have not been completed. Evaluate accordingly.\n\n## License\n\nDistributed under the NVIDIA Open Model License. See the [original model](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) for full terms.\n\n## Quantized by\n\n[DJLougen](https://huggingface.co/DJLougen) using llama.cpp on DGX Spark\n",
    "related_quantizations": []
  },
  "tags": [
    "llama.cpp",
    "gguf",
    "nemotron_h",
    "reap",
    "pruned",
    "sparse-moe",
    "mamba",
    "quantized",
    "text-generation",
    "arxiv:2510.13999",
    "base_model:0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft",
    "base_model:quantized:0xSero/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-REAP-50pct-draft",
    "license:other",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 991,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-28T16:07:41.000Z",
  "created_at": "2026-03-28T14:08:28.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "llama.cpp"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69c7e0dc17242a3b23740340",
  "id": "DJLougen/Nemotron-H-120B-REAP-50pct-GGUF",
  "modelId": "DJLougen/Nemotron-H-120B-REAP-50pct-GGUF",
  "sha": "a7c5011b4c39c714c6de3904649ee2605e955535",
  "createdAt": "2026-03-28T14:08:28.000Z",
  "lastModified": "2026-03-28T16:07:41.000Z",
  "author": "DJLougen",
  "downloads": 991,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "llama.cpp",
  "siblings_count": 5
}

djlougen/nemotron-h-120b-reap-50pct-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard