beinsezii/nemotron-3-super-120b-a12b-gguf-halo q6k_ffn GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

beinsezii/nemotron-3-super-120b-a12b-gguf-halo overview

Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems. The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K. See the GLM version for more details on theory and comparisons.

ggufbase_model:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16base_model:quantized:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16endpoints_compatibleregion:usimatrixconversational

beinsezii/nemotron-3-super-120b-a12b-gguf-halo visual

Downloads

290

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

2 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
imatrix.gguf	GGUF	—	295.44 MB	Download
nemotron-3-120b-a12b-q80-q6k_ffn.gguf	GGUF	—	91.45 GB	Download

Model Details Live

Model Slug

beinsezii/nemotron-3-super-120b-a12b-gguf-halo

Author

Beinsezii

Pipeline Task

—

Library

—

Created

2026-03-12

Last Modified

2026-03-12

Gated

Private

HF SHA

e8ca47998de42310967dc7ff47c186993cd79733

License

Unknown

Language

Unknown

Base Model

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": [
      "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
    ],
    "frontmatter": {
      "base_model": [
        "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
      ]
    },
    "hero_image_url": "",
    "summary": "Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems. The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K. See the GLM version for more details on theory and comparisons.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model:\n- nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16\n---\n\nQuant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.\n\nThe TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.\n\nSee the [GLM version](https://huggingface.co/Beinsezii/GLM-4.6V-GGUF-HALO) for more details on theory and comparisons.\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "base_model:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
    "base_model:quantized:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 3,
  "downloads": 290,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-12T07:04:14.000Z",
  "created_at": "2026-03-12T06:33:41.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69b25e45895b2d5026f80142",
  "id": "Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO",
  "modelId": "Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO",
  "sha": "e8ca47998de42310967dc7ff47c186993cd79733",
  "createdAt": "2026-03-12T06:33:41.000Z",
  "lastModified": "2026-03-12T07:04:14.000Z",
  "author": "Beinsezii",
  "downloads": 290,
  "likes": 3,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 5
}

beinsezii/nemotron-3-super-120b-a12b-gguf-halo overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard