NexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF overview
Nemotron 3 Nano 30B A3B — Nexus Agents GGUF A LoRA fine tune of nvidia/NVIDIA Nemotron 3 Nano 30B A3B hybrid Mamba/Transformer MoE, ~3B active specialized for …
Runs locally from ~22.83 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | NexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF |
|---|---|
| Author | NexusProjectsAI |
| Pipeline | text-generation |
| License | other |
| Base model | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 |
| Last modified | 2026-06-11T15:48:04.000Z |
Model README
---
license: other
license_name: nvidia-open-model-license
license_link: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
base_model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
datasets:
- NexusProjectsAI/Nexus-Agents-ToolCalling
tags:
- gguf
- tool-calling
- function-calling
- agents
- lora
- nexus
- nemotron
pipeline_tag: text-generation
---
Nemotron-3-Nano-30B-A3B — Nexus Agents (GGUF)
A LoRA fine-tune of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B (hybrid Mamba/Transformer
MoE, ~3B active) specialized for the Nexus Projects agent stack.
Links: the exact training + verification data →
the tool that generated the data, trained, quantized, and evaluated this model →
the app these agents power →
- Setup interview — infers industry/platforms/objectives from a free-text idea
("I want to sell lemonade" → Food & Beverage), asks instead of guessing when the
input is vague or ambiguous ("a lemonade stand" → asks Food & Beverage or Retail?),
and looks libraries up on the internet (pub.dev/GitHub) to pick current ones before
finishing.
- Discovery — builds a user-story tree (
As a … I want … so that …). - Task generation — turns setup + stories into concrete, stack-specific tasks with
acceptance criteria and verification commands.
How it was trained
Two LoRA stages (rank 16/scale 32, attention + Mamba mixer + always-on
shared_experts MLP — never attention-only), on schema-verified synthetic tool-calling
conversations carrying the agents' real tool schemas — train == serve. Both corpora are
published in the dataset repo:
| Stage | Dataset config | Rows | What it taught |
|---|---|---|---|
| 1 — full LoRA | stage1 | 60,185 | the three core skills |
| 2 — additive recovery | stage2-recovery | 16,015 | error-recovery discipline (resume a broken board state, re-ask nothing, finalize), with ~50% stage-1 replay |
Results — base vs fine-tuned
Behavioral interview eval (27 end-to-end agent scenarios, greedy decoding)
Each scenario is a full multi-turn setup interview driven against the served GGUF with
the agents' real tool schemas. A case passes only if the model **covers every required
field, never re-asks an answered question, and cleanly finalizes**. Full transcripts for
every case (base + fine-tuned) are published in the dataset repo's
folder.
| Metric | Base (no LoRA) | Fine-tuned |
|---|---|---|
| Scenarios passed | 13 / 27 | 27 / 27 |
| Asks each question at most once | 52% | 100% |
| Completes (finalizes) the interview | 92% | 100% |
| Avg redundant re-asks per interview | 4.92 | 0.89 |
Tool-call accuracy (BFCL-style, 150 held-out calls)
| Metric | Base (no LoRA) | Fine-tuned | Lift |
|---|---|---|---|
| Function-name exact | 35.3% | 95.3% | +60.0 |
| Argument-keys match | 32.7% | 95.3% | +62.6 |
| Arguments exact | 12.0% | 64.0% | +52.0 |
The base model only emitted a parseable tool call 109/150 times; the fine-tune did so
147/150. In short, fine-tuning ~tripled tool-use accuracy on the Nexus tool set.
Quantizations
All quants are imatrix-calibrated.
| File | Bits | Size |
|---|---|---|
| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q8_0.gguf | 8-bit | 31.3 GB |
| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q6_K.gguf | 6-bit | 31.2 GB |
| Nemotron-3-Nano-30B-A3B-Nexus-Agents-Q4_K_M.gguf | 4-bit | 22.8 GB |
imatrix.dat (the calibration importance matrix) is included for re-quantizing.
⚠️ Serving requirements
- Disable thinking. Nemotron-3-Nano is a reasoning model; with thinking on it reasons
in prose instead of calling tools. Serve with enable_thinking=false (renders
<think></think>). It is also prompt-sensitive — use the agent's real system prompt.
- Tool-call format is Nemotron's
<tool_call><function=NAME><parameter=key>value</parameter>…</function></tool_call>
— parse that (not JSON) on the client.
License
Inherits the NVIDIA Open Model License from the base model.
Run NexusProjectsAI/Nemotron-3-Nano-30B-A3B-Nexus-Agents-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models