GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

NexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF overview

Nemotron 3 Nano 30B A3B — Nexus Agents GGUF A LoRA fine tune of nvidia/NVIDIA Nemotron 3 Nano 30B A3B hybrid Mamba/Transformer MoE, ~3B active specialized for …

gguftool-callingfunction-callingagentsloranexusnemotrontext-generationdataset:NexusProjectsAI/Nexus-Agents-ToolCallingbase_model:nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16base_model:adapter:nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16license:otherendpoints_compatibleregion:usimatrixconversational

Runs locally from ~22.83 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
314
Likes
0
Pipeline
text-generation

Repository Files & Downloads

3 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q4_K_M.ggufGGUFQ4_K_M22.83 GBDownload
Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q6_K.ggufGGUFQ6_K31.21 GBDownload
Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q8_0.ggufGGUFQ8_031.28 GBDownload

Model Details

Model IDNexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF
AuthorNexusProjectsAI
Pipelinetext-generation
Licenseother
Base modelnvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Last modified2026-06-11T15:48:04.000Z

Model README

---

license: other

license_name: nvidia-open-model-license

license_link: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

base_model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

datasets:

- NexusProjectsAI/Nexus-Agents-ToolCalling

tags:

- gguf

- tool-calling

- function-calling

- agents

- lora

- nexus

- nemotron

pipeline_tag: text-generation

---

Nemotron-3-Nano-30B-A3B — Nexus Agents (GGUF)

A LoRA fine-tune of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B (hybrid Mamba/Transformer

MoE, ~3B active) specialized for the Nexus Projects agent stack.

Links: the exact training + verification data →

Nexus-Agents-ToolCalling ·

the tool that generated the data, trained, quantized, and evaluated this model →

Nexus Training Studio ·

the app these agents power →

Nexus Projects client

  • Setup interview — infers industry/platforms/objectives from a free-text idea

("I want to sell lemonade"Food & Beverage), asks instead of guessing when the

input is vague or ambiguous ("a lemonade stand" → asks Food & Beverage or Retail?),

and looks libraries up on the internet (pub.dev/GitHub) to pick current ones before

finishing.

  • Discovery — builds a user-story tree (As a … I want … so that …).
  • Task generation — turns setup + stories into concrete, stack-specific tasks with

acceptance criteria and verification commands.

How it was trained

Two LoRA stages (rank 16/scale 32, attention + Mamba mixer + always-on

shared_experts MLP — never attention-only), on schema-verified synthetic tool-calling

conversations carrying the agents' real tool schemas — train == serve. Both corpora are

published in the dataset repo:

| Stage | Dataset config | Rows | What it taught |

|---|---|---|---|

| 1 — full LoRA | stage1 | 60,185 | the three core skills |

| 2 — additive recovery | stage2-recovery | 16,015 | error-recovery discipline (resume a broken board state, re-ask nothing, finalize), with ~50% stage-1 replay |

Results — base vs fine-tuned

Behavioral interview eval (27 end-to-end agent scenarios, greedy decoding)

Each scenario is a full multi-turn setup interview driven against the served GGUF with

the agents' real tool schemas. A case passes only if the model **covers every required

field, never re-asks an answered question, and cleanly finalizes**. Full transcripts for

every case (base + fine-tuned) are published in the dataset repo's

verification/

folder.

| Metric | Base (no LoRA) | Fine-tuned |

|---|---|---|

| Scenarios passed | 13 / 27 | 27 / 27 |

| Asks each question at most once | 52% | 100% |

| Completes (finalizes) the interview | 92% | 100% |

| Avg redundant re-asks per interview | 4.92 | 0.89 |

Tool-call accuracy (BFCL-style, 150 held-out calls)

| Metric | Base (no LoRA) | Fine-tuned | Lift |

|---|---|---|---|

| Function-name exact | 35.3% | 95.3% | +60.0 |

| Argument-keys match | 32.7% | 95.3% | +62.6 |

| Arguments exact | 12.0% | 64.0% | +52.0 |

The base model only emitted a parseable tool call 109/150 times; the fine-tune did so

147/150. In short, fine-tuning ~tripled tool-use accuracy on the Nexus tool set.

Quantizations

All quants are imatrix-calibrated.

| File | Bits | Size |

|---|---|---|

| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q8_0.gguf | 8-bit | 31.3 GB |

| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q6_K.gguf | 6-bit | 31.2 GB |

| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q4_K_M.gguf | 4-bit | 22.8 GB |

imatrix.dat (the calibration importance matrix) is included for re-quantizing.

⚠️ Serving requirements

  1. Disable thinking. Nemotron-3-Nano is a reasoning model; with thinking on it reasons

in prose instead of calling tools. Serve with enable_thinking=false (renders

<think></think>). It is also prompt-sensitive — use the agent's real system prompt.

  1. Tool-call format is Nemotron's

<tool_call><function=NAME><parameter=key>value</parameter>…</function></tool_call>

— parse that (not JSON) on the client.

License

Inherits the NVIDIA Open Model License from the base model.

Run NexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models