GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

avar6/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-gguf overview

Llamacpp mainline compatible gguf quants of nvidia/NVIDIA Nemotron 3 Ultra 550B A55B Base BF16. Note that this is NOT the instruct tuned model, its nvidia's ba…

ggufoptimizedmixed-ggufbase_model:nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16base_model:quantized:nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16endpoints_compatibleregion:usimatrix

Runs locally from ~31.71 GB disk (32 GB+ VRAM class GPUs with llama.cpp / guIDE).

Downloads
655
Likes
0
Pipeline
Author

Repository Files & Downloads

6 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00001-of-00006.ggufGGUFIQ4_XS44.83 GBDownload
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00002-of-00006.ggufGGUFIQ4_XS45.59 GBDownload
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00003-of-00006.ggufGGUFIQ4_XS45.50 GBDownload
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00004-of-00006.ggufGGUFIQ4_XS45.59 GBDownload
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00005-of-00006.ggufGGUFIQ4_XS45.39 GBDownload
IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00006-of-00006.ggufGGUFIQ4_XS31.71 GBDownload

Model Details

Model IDavar6/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-gguf
Authoravar6
Pipeline
License
Base modelnvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16
Last modified2026-06-20T18:08:34.000Z

Model README

---

base_model:

  • nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

tags:

  • gguf
  • optimized
  • mixed-gguf

---

Llamacpp mainline compatible gguf quants of nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16. Note that this is NOT the instruct tuned model, its nvidia's base checkpoint. These quants are made using the same scheme as Aes Sedai's optimized quants for nemotron ultra instruct. He graciously provided the imatrix and commands used for these. Though as of this commit (10June2026) llamacpp still needs to be patched in order to make nemotron ultra ggufs

With chat completions, this model has some artifacts and strings that show up in the chat but it responds well enough to turn based chatting. Using text completions in with the below instruct json, the model actually reaponds normally. Tested in silly tavern without thinking

{
"input_sequence": "<|im_start|>user",
"output_sequence": "<|im_start|>assistant\n<think></think>{{name}}:",
"last_output_sequence": "",
"system_sequence": "<|im_start|>system",
"stop_sequence": "<|im_end|>",
"wrap": true,
"macro": true,
"names_behavior": "none",
"activation_regex": "",
"first_output_sequence": "",
"skip_examples": false,
"output_suffix": "<|im_end|>\n",
"input_suffix": "<|im_end|>\n",
"system_suffix": "<|im_end|>\n",
"user_alignment_message": "",
"system_same_as_user": false,
"last_system_sequence": "",
"first_input_sequence": "",
"last_input_sequence": "",
"sequences_as_stop_strings": true,
"story_string_prefix": "",
"story_string_suffix": "",
"extensions": {},
"name": "ChatML - Super3 No Reasoning"
}

Run avar6/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-gguf with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models