GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

ZeZZm/aero-deuce-GGUF overview

Aero Deuce — GGUF Q4 K M A fine tuned Gemma 4 12B instruction following model. This is the GGUF quantized version ~7 GB that runs locally on CPU or GPU with no…

ggufq4_k_mgemmainstruction-followingtext-generationllama.cpplicense:apache-2.0region:usconversational

Runs locally from ~6.87 GB disk (8 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation
Author

Repository Files & Downloads

1 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
aero-deuce-q4km.ggufGGUFQ4KM6.87 GBDownload

Model Details

Model IDZeZZm/aero-deuce-GGUF
AuthorZeZZm
Pipelinetext-generation
Licenseapache-2.0
Base modelgoogle/gemma-4-12b-it
Last modified2026-06-07T22:01:09.000Z

Model README

---

license: apache-2.0

base_model: google/gemma-4-12b-it

tags:

  • gguf
  • q4_k_m
  • gemma
  • instruction-following
  • text-generation
  • llama.cpp

inference: false

---

Aero-Deuce — GGUF Q4_K_M

A fine-tuned Gemma 4 12B instruction-following model. This is the GGUF quantized version (~7 GB) that runs locally on CPU or GPU with no Python required.

Download

Click the Files and versions tab above and download aero-deuce-q4km.gguf. That's the only file you need.

Which format should I use?

| Format | Best for | Link |

|---|---|---|

| GGUF ← you are here | Local inference, llama.cpp, LM Studio, GPT4All | This repo |

| MLX 4-bit | Apple Silicon (Mac) | ZeZZm/aero-deuce-MLX |

| LoRA Adapter | Merging with base model, further fine-tuning | ZeZZm/aero-deuce |

Quick Start

LM Studio (easiest — GUI app):

  1. Download LM Studio
  2. Search for ZeZZm/aero-deuce-GGUF
  3. Click download, then chat

llama.cpp:

# Download
wget https://huggingface.co/ZeZZm/aero-deuce-GGUF/resolve/main/aero-deuce-q4km.gguf

# Run
llama-cli -m aero-deuce-q4km.gguf -c 4096 --conversation

Ollama:

# After downloading the GGUF file:
echo 'FROM ./aero-deuce-q4km.gguf
SYSTEM "You are Aero-Deuce, developed by the Aero-Deuce team."
PARAMETER stop "<|end_of_turn>"
PARAMETER stop "<|start_of_turn>"' > Modelfile

ollama create aero-deuce -f Modelfile
ollama run aero-deuce

GPT4All:

  1. Download GPT4All
  2. File → Open → select aero-deuce-q4km.gguf
  3. Start chatting

Model Details

| Property | Value |

|---|---|

| Base Model | google/gemma-4-12b-it (12B params) |

| Training Method | QLoRA + Muon optimizer |

| Training Data | 30K instruction-following samples |

| Training Steps | 2,000 |

| Quantization | Q4_K_M (~4.95 bits per weight) |

| File Size | ~7 GB |

| Context Length | 4,096 tokens |

System Prompt

A system prompt identifying the model as Aero-Deuce is embedded in the GGUF chat template. It works automatically in most frontends. For llama-cli, pass -sys "You are Aero-Deuce." for best results.

License

Apache 2.0

Run ZeZZm/aero-deuce-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models