NVIDIA cloud models
18 models tracked via Artificial Analysis. Compare cloud performance, then find local GGUF versions in the GraySoft model catalog.
| Model | Intelligence | Speed (tok/s) |
|---|---|---|
| Nemotron 3 Ultra 550B A55B (Reasoning) | 47.7 | 171.82 |
| NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 36 | 217.675 |
| Nemotron Cascade 2 30B A3B | 28.4 | 0 |
| NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 24.3 | 115.627 |
| Nemotron 3 Nano Omni 30B A3B Reasoning | 21.4 | 302.289 |
| Llama Nemotron Super 49B v1.5 (Reasoning) | 18.7 | 50.346 |
| Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 18.5 | 0 |
| Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 15 | 51.481 |
| NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 14.9 | 288.725 |
| NVIDIA Nemotron Nano 9B V2 (Reasoning) | 14.8 | 114.324 |
| NVIDIA Nemotron 3 Nano 4B | 14.7 | 0 |
| Llama Nemotron Super 49B v1.5 (Non-reasoning) | 14.6 | 48.537 |
| Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 14.4 | 0 |
| Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 14.3 | 0 |
| Llama 3.1 Nemotron Instruct 70B | 13.4 | 305.943 |
| NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 13.2 | 56.801 |
| NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 13.2 | 152.58 |
| NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 10.1 | 217.512 |
Run models locally with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.