What license applies to NobodyWho/Google_Gemma4-12B-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

How do I run NobodyWho/Google_Gemma4-12B-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: image-text-to-text.

Model Intelligence Sheet

NobodyWho/Google_Gemma4-12B-GGUF overview

NobodyWho/Google Gemma4 12B GGUF Overview GGUF quantization of Google's Gemma 4 12B Unified model, re hosted for NobodyWho https://github.com/nobodywho ooo/nob…

ggufnobodywhotool-callingvisiongemmaimage-text-to-textbase_model:google/gemma-4-12Bbase_model:quantized:google/gemma-4-12Blicense:apache-2.0endpoints_compatibleregion:usconversational

Runs locally from ~167.0 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

image-text-to-text

Author

NobodyWho

Repository Files & Downloads

4 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
gemma-4-12b-it-BF16.gguf	GGUF	BF16	22.20 GB	Download
gemma-4-12b-it-Q4_K_M.gguf	GGUF	Q4_K_M	6.63 GB	Download
gemma-4-12b-it-Q8_0.gguf	GGUF	Q8_0	11.80 GB	Download
mmproj-BF16.gguf	GGUF	BF16	167.0 MB	Download

Model Details

Model ID	NobodyWho/Google_Gemma4-12B-GGUF
Author	NobodyWho
Pipeline	image-text-to-text
License	apache-2.0
Base model	google/gemma-4-12B
Last modified	2026-06-16T04:37:10.000Z

Model README

---

license: apache-2.0

base_model: google/gemma-4-12B

tags:

- gguf

- nobodywho

- tool-calling

- vision

- gemma

pipeline_tag: image-text-to-text

library_name: gguf

---

NobodyWho/Google_Gemma4-12B-GGUF

Overview

GGUF quantization of Google's Gemma 4 12B (Unified) model, re-hosted for

NobodyWho. The unsloth build already ships a

tool-calling setup and recommended sampling metadata (general.sampling: temp 1.0,

top_k 64, top_p 0.95), so nothing needs patching — the model is verified with NobodyWho's test

suite. The 12B Unified variant is the laptop-class Gemma 4 — stronger reasoning and multimodal

capability than the edge (E2B/E4B) models while staying well below the larger MoE/dense variants

in memory. Multimodal (text + image), multilingual, Apache 2.0.

Model Capabilities

Text generation — instruction-following chat, stronger reasoning
Tool calling — native function calling with grammar-constrained output
Vision — ⚠️ the 12B mmproj (vision + audio encoder) currently **fails to load in

NobodyWho** (llama.cpp MTMD/CLIP init error); needs a newer llama.cpp. Text + tool calling are

unaffected. For vision today, use Gemma 4 E2B/E4B (verified working)

Long context — 256k tokens
Multilingual — 140+ languages

Available Quantizations

| File | Approach | Tool-calling tests |

|------|----------|--------------------|

| gemma-4-12b-it-BF16.gguf | Sampling embedded upstream | not separately run |

| gemma-4-12b-it-Q8_0.gguf | Sampling embedded upstream | 14/14 |

| gemma-4-12b-it-Q4_K_M.gguf | Sampling embedded upstream | 14/14 |

| mmproj-BF16.gguf | Vision projection — ⚠️ does not load in NobodyWho yet | — |

> Tool calling verified on Q8_0 and Q4_K_M (14/14 each, June 2026; BF16 hosted but not separately tested — 24 GB).

> Vision: the 12B mmproj fails to load in the current NobodyWho build (llama.cpp MTMD/CLIP

> init error) — Gemma 4 E2B/E4B vision is verified working. Quant names follow the unsloth gemma-4-12b-it-GGUF repo.

Quick Start

Using the NobodyWho library:

from nobodywho import Chat

chat = Chat("huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response)  # The capital of Denmark is Copenhagen.

Vision

> ⚠️ Not working yet on 12B: the mmproj fails to load in the current NobodyWho build. The

> snippet below is the intended API (it works for Gemma 4 E2B/E4B today).

from nobodywho import Model, Chat, Prompt, Image, Text

model = Model(
    "huggingface:NobodyWho/Google_Gemma4-12B-GGUF/gemma-4-12b-it-Q4_K_M.gguf",
    projection_model_path="huggingface:NobodyWho/Google_Gemma4-12B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
    Text("What is in this image?"),
    Image("./photo.png"),
])).completed()
print(response)

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="NobodyWho/Google_Gemma4-12B-GGUF",
    filename="gemma-4-12b-it-Q4_K_M.gguf",
)

Model Specifications

Parameters: 12B (Unified)
Context length: 262,144 tokens (256K)
License: Apache 2.0
Base model: google/gemma-4-12B
Architecture: gemma4 (vision-capable)

Licensing / Credits

Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to Google

DeepMind. GGUF quantizations provided by unsloth.

Run NobodyWho/Google_Gemma4-12B-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models