GraySoft
Projects Models About FAQ Contact Download guIDE →

beinsezii/nemotron-3-super-120b-a12b-gguf-halo q6k_ffn GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

beinsezii/nemotron-3-super-120b-a12b-gguf-halo overview

Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems. The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K. See the GLM version for more details on theory and comparisons.

ggufbase_model:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16base_model:quantized:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16endpoints_compatibleregion:usimatrixconversational
beinsezii/nemotron-3-super-120b-a12b-gguf-halo visual
Downloads
290
Likes
3
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

2 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
imatrix.gguf GGUF 295.44 MB Download
nemotron-3-120b-a12b-q80-q6k_ffn.gguf GGUF 91.45 GB Download

Model Details Live

Model Slug
beinsezii/nemotron-3-super-120b-a12b-gguf-halo
Author
Beinsezii
Pipeline Task
Library
Created
2026-03-12
Last Modified
2026-03-12
Gated
No
Private
No
HF SHA
e8ca47998de42310967dc7ff47c186993cd79733
License
Unknown
Language
Unknown
Base Model
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "base_model": [
      "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
    ],
    "frontmatter": {
      "base_model": [
        "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
      ]
    },
    "hero_image_url": "",
    "summary": "Quant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems. The TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K. See the GLM version for more details on theory and comparisons.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model:\n- nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16\n---\n\nQuant optimized for quality / speed on a Strix Halo 128GiB system. Possibly also beneficial on DGX Spark and similar systems.\n\nThe TL;DR is this quant achieves both superior quality and speed compared to homogenous Q6_K.\n\nSee the [GLM version](https://huggingface.co/Beinsezii/GLM-4.6V-GGUF-HALO) for more details on theory and comparisons.\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "base_model:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
    "base_model:quantized:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 3,
  "downloads": 290,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-12T07:04:14.000Z",
  "created_at": "2026-03-12T06:33:41.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69b25e45895b2d5026f80142",
  "id": "Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO",
  "modelId": "Beinsezii/Nemotron-3-Super-120B-A12B-GGUF-HALO",
  "sha": "e8ca47998de42310967dc7ff47c186993cd79733",
  "createdAt": "2026-03-12T06:33:41.000Z",
  "lastModified": "2026-03-12T07:04:14.000Z",
  "author": "Beinsezii",
  "downloads": 290,
  "likes": 3,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 5
}