avar6/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-gguf overview
Llamacpp mainline compatible gguf quants of nvidia/NVIDIA Nemotron 3 Ultra 550B A55B Base BF16. Note that this is NOT the instruct tuned model, its nvidia's ba…
Runs locally from ~31.71 GB disk (32 GB+ VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00001-of-00006.gguf | GGUF | IQ4_XS | 44.83 GB | Download |
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00002-of-00006.gguf | GGUF | IQ4_XS | 45.59 GB | Download |
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00003-of-00006.gguf | GGUF | IQ4_XS | 45.50 GB | Download |
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00004-of-00006.gguf | GGUF | IQ4_XS | 45.59 GB | Download |
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00005-of-00006.gguf | GGUF | IQ4_XS | 45.39 GB | Download |
| IQ4_XS/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16-IQ4_XS.gguf-00006-of-00006.gguf | GGUF | IQ4_XS | 31.71 GB | Download |
Model Details
Model README
---
base_model:
- nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16
tags:
- gguf
- optimized
- mixed-gguf
---
Llamacpp mainline compatible gguf quants of nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16. Note that this is NOT the instruct tuned model, its nvidia's base checkpoint. These quants are made using the same scheme as Aes Sedai's optimized quants for nemotron ultra instruct. He graciously provided the imatrix and commands used for these. Though as of this commit (10June2026) llamacpp still needs to be patched in order to make nemotron ultra ggufs
With chat completions, this model has some artifacts and strings that show up in the chat but it responds well enough to turn based chatting. Using text completions in with the below instruct json, the model actually reaponds normally. Tested in silly tavern without thinking
{
"input_sequence": "<|im_start|>user",
"output_sequence": "<|im_start|>assistant\n<think></think>{{name}}:",
"last_output_sequence": "",
"system_sequence": "<|im_start|>system",
"stop_sequence": "<|im_end|>",
"wrap": true,
"macro": true,
"names_behavior": "none",
"activation_regex": "",
"first_output_sequence": "",
"skip_examples": false,
"output_suffix": "<|im_end|>\n",
"input_suffix": "<|im_end|>\n",
"system_suffix": "<|im_end|>\n",
"user_alignment_message": "",
"system_same_as_user": false,
"last_system_sequence": "",
"first_input_sequence": "",
"last_input_sequence": "",
"sequences_as_stop_strings": true,
"story_string_prefix": "",
"story_string_suffix": "",
"extensions": {},
"name": "ChatML - Super3 No Reasoning"
}Run avar6/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-gguf with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models