ZeZZm/aero-deuce-GGUF overview
Aero Deuce — GGUF Q4 K M A fine tuned Gemma 4 12B instruction following model. This is the GGUF quantized version ~7 GB that runs locally on CPU or GPU with no…
Runs locally from ~6.87 GB disk (8 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| aero-deuce-q4km.gguf | GGUF | Q4KM | 6.87 GB | Download |
Model Details
| Model ID | ZeZZm/aero-deuce-GGUF |
|---|---|
| Author | ZeZZm |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | google/gemma-4-12b-it |
| Last modified | 2026-06-07T22:01:09.000Z |
Model README
---
license: apache-2.0
base_model: google/gemma-4-12b-it
tags:
- gguf
- q4_k_m
- gemma
- instruction-following
- text-generation
- llama.cpp
inference: false
---
Aero-Deuce — GGUF Q4_K_M
A fine-tuned Gemma 4 12B instruction-following model. This is the GGUF quantized version (~7 GB) that runs locally on CPU or GPU with no Python required.
Download
Click the Files and versions tab above and download aero-deuce-q4km.gguf. That's the only file you need.
Which format should I use?
| Format | Best for | Link |
|---|---|---|
| GGUF ← you are here | Local inference, llama.cpp, LM Studio, GPT4All | This repo |
| MLX 4-bit | Apple Silicon (Mac) | ZeZZm/aero-deuce-MLX |
| LoRA Adapter | Merging with base model, further fine-tuning | ZeZZm/aero-deuce |
Quick Start
LM Studio (easiest — GUI app):
- Download LM Studio
- Search for
ZeZZm/aero-deuce-GGUF - Click download, then chat
llama.cpp:
# Download
wget https://huggingface.co/ZeZZm/aero-deuce-GGUF/resolve/main/aero-deuce-q4km.gguf
# Run
llama-cli -m aero-deuce-q4km.gguf -c 4096 --conversation
Ollama:
# After downloading the GGUF file:
echo 'FROM ./aero-deuce-q4km.gguf
SYSTEM "You are Aero-Deuce, developed by the Aero-Deuce team."
PARAMETER stop "<|end_of_turn>"
PARAMETER stop "<|start_of_turn>"' > Modelfile
ollama create aero-deuce -f Modelfile
ollama run aero-deuce
GPT4All:
- Download GPT4All
- File → Open → select
aero-deuce-q4km.gguf - Start chatting
Model Details
| Property | Value |
|---|---|
| Base Model | google/gemma-4-12b-it (12B params) |
| Training Method | QLoRA + Muon optimizer |
| Training Data | 30K instruction-following samples |
| Training Steps | 2,000 |
| Quantization | Q4_K_M (~4.95 bits per weight) |
| File Size | ~7 GB |
| Context Length | 4,096 tokens |
System Prompt
A system prompt identifying the model as Aero-Deuce is embedded in the GGUF chat template. It works automatically in most frontends. For llama-cli, pass -sys "You are Aero-Deuce." for best results.
License
Apache 2.0
Run ZeZZm/aero-deuce-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models