NobodyWho/Google_Gemma4-12B-GGUF overview
NobodyWho/Google Gemma4 12B GGUF Overview GGUF quantization of Google's Gemma 4 12B Unified model, re hosted for NobodyWho https://github.com/nobodywho ooo/nob…
Runs locally from ~167.0 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | NobodyWho/Google_Gemma4-12B-GGUF |
|---|---|
| Author | NobodyWho |
| Pipeline | image-text-to-text |
| License | apache-2.0 |
| Base model | google/gemma-4-12B |
| Last modified | 2026-06-16T04:37:10.000Z |
Model README
---
license: apache-2.0
base_model: google/gemma-4-12B
tags:
- gguf
- nobodywho
- tool-calling
- vision
- gemma
pipeline_tag: image-text-to-text
library_name: gguf
---
NobodyWho/Google_Gemma4-12B-GGUF
Overview
GGUF quantization of Google's Gemma 4 12B (Unified) model, re-hosted for
NobodyWho. The unsloth build already ships a
tool-calling setup and recommended sampling metadata (general.sampling: temp 1.0,
top_k 64, top_p 0.95), so nothing needs patching — the model is verified with NobodyWho's test
suite. The 12B Unified variant is the laptop-class Gemma 4 — stronger reasoning and multimodal
capability than the edge (E2B/E4B) models while staying well below the larger MoE/dense variants
in memory. Multimodal (text + image), multilingual, Apache 2.0.
Model Capabilities
- Text generation — instruction-following chat, stronger reasoning
- Tool calling — native function calling with grammar-constrained output
- Vision — ⚠️ the 12B
mmproj(vision + audio encoder) currently **fails to load in
NobodyWho** (llama.cpp MTMD/CLIP init error); needs a newer llama.cpp. Text + tool calling are
unaffected. For vision today, use Gemma 4 E2B/E4B (verified working)
- Long context — 256k tokens
- Multilingual — 140+ languages
Available Quantizations
| File | Approach | Tool-calling tests |
|------|----------|--------------------|
| gemma-4-12b-it-BF16.gguf | Sampling embedded upstream | not separately run |
| gemma-4-12b-it-Q8_0.gguf | Sampling embedded upstream | 14/14 |
| gemma-4-12b-it-Q4_K_M.gguf | Sampling embedded upstream | 14/14 |
| mmproj-BF16.gguf | Vision projection — ⚠️ does not load in NobodyWho yet | — |
> Tool calling verified on Q8_0 and Q4_K_M (14/14 each, June 2026; BF16 hosted but not separately tested — 24 GB).
> Vision: the 12B mmproj fails to load in the current NobodyWho build (llama.cpp MTMD/CLIP
> init error) — Gemma 4 E2B/E4B vision is verified working. Quant names follow the unsloth gemma-4-12b-it-GGUF repo.
Quick Start
Using the NobodyWho library:
from nobodywho import Chat
chat = Chat("huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response) # The capital of Denmark is Copenhagen.
Vision
> ⚠️ Not working yet on 12B: the mmproj fails to load in the current NobodyWho build. The
> snippet below is the intended API (it works for Gemma 4 E2B/E4B today).
from nobodywho import Model, Chat, Prompt, Image, Text
model = Model(
"huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf",
projection_model_path="huggingface:NobodyWho/Google_Gemma4-12B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
Text("What is in this image?"),
Image("./photo.png"),
])).completed()
print(response)
llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="NobodyWho/Google_Gemma4-12B-GGUF",
filename="gemma-4-12b-it-Q4_K_M.gguf",
)
Model Specifications
- Parameters: 12B (Unified)
- Context length: 262,144 tokens (256K)
- License: Apache 2.0
- Base model: google/gemma-4-12B
- Architecture: gemma4 (vision-capable)
Licensing / Credits
Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to Google
DeepMind. GGUF quantizations provided by unsloth.
Run NobodyWho/Google_Gemma4-12B-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models