GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

Dzluck/gemma4-e4b-claude-coder-GGUF overview

Gemma 4 Claude Coder — local model family A family of custom models built on Gemma 4 edge variants E2B and E4B , tuned to act as autonomous coding and administ…

ggufollamaclaude-codecodingagentfunction-callinggemmatext-generationenlicense:apache-2.0endpoints_compatibleregion:us

Runs locally from ~8.95 GB disk (12 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation
Author

Repository Files & Downloads

1 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
gemma4-e4b-claude-coder.Q4_K_M.ggufGGUFGGUF8.95 GBDownload

Model Details

Model IDDzluck/gemma4-e4b-claude-coder-GGUF
AuthorDzluck
Pipelinetext-generation
Licenseapache-2.0
Base modelgoogle/gemma-3n-e4b
Last modified2026-06-22T04:43:18.000Z

Model README

---

license: apache-2.0

base_model: google/gemma-3n-e4b

library_name: gguf

tags:

- gguf

- ollama

- claude-code

- coding

- agent

- function-calling

- gemma

language:

- en

pipeline_tag: text-generation

---

Gemma 4 Claude Coder — local model family

A family of custom models built on Gemma 4 (edge variants E2B and E4B), tuned to act as

autonomous coding and administration agents. The models speak the Anthropic-compatible API,

so they drive Claude Code fully locally — your code never leaves your machine and cloud token

cost drops to zero.

Each model ships with a system prompt focused on real work inside a codebase: use tools instead

of guessing, make minimal and precise code changes, return complete and runnable output, and

verify after acting. Sampling follows Google's official Gemma 4 recommendation

(temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before

a tool call.

The idea

The whole point of this family is to run Claude Code on small, popular, consumer-grade hardware.

No datacenter GPU, no cloud bill — just an everyday Mac Mini (or similar 16 GB machine) acting as a

fully local, agentic coding assistant. These models make that practical: light enough to fit, smart

enough to drive real tool-calling agent loops.

In a time of RAM shortages and the big tech giants tightening usage limits and quotas, owning a

capable agent that runs entirely on your own modest hardware stops being a hobby and becomes

leverage: no rate limits, no surprise pricing, no dependency on someone else's quota.

Models in the family

| Model | Base | Context | Purpose |

|---|---|---|---|

| gemma4-e2b-claude-coder | Gemma 4 E2B (eff. 2B / 5.1B with embeddings) | 64K | Fast everyday coding agent — edits, autocomplete, short agent loops. Lightest on memory. |

| gemma4-e4b-claude-coder | Gemma 4 E4B (eff. 4B / 8B with embeddings) | 64K | Stronger coding agent — better reasoning and tool use on larger tasks. |

| gemma4-e4b-claude-coder-admin | Gemma 4 E4B | 32K | Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput. |

What it's for

  • Driving Claude Code locally (ollama launch claude --model <name>).
  • Agentic code writing and editing with native function calling / tool use.
  • Administration and devops tasks on a server (the admin variant).
  • Full privacy and offline operation — no code sent to the cloud.

Context

  • Coders (E2B / E4B): 64K tokens — matching Claude Code's recommendation (64K minimum).
  • Admin (E4B): 32K tokens — a deliberate trade-off for 16 GB hardware that keeps the model

entirely on the GPU.

  • Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware.

Test hardware

The models were built and tested on:

  • Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6
  • Ollama 0.24, GPU (Metal) inference

Measured performance (16 GB RAM)

| Model | Placement | Speed | Tool calling |

|---|---|---|---|

| gemma4-e2b-claude-coder | 100% GPU | ~55 tok/s | ✅ valid JSON |

| gemma4-e4b-claude-coder (64K) | 39% GPU / 61% CPU | ~27 tok/s (drops under load) | ✅ |

| gemma4-e4b-claude-coder-admin (32K) | 100% GPU | ~30 tok/s (stable) | ✅ |

All three passed an end-to-end test through Claude Code: real turns with tool calls and correct

responses (HTTP 200 on /v1/messages).

How they were made

These models were designed, built and tested with the help of Claude Opus 4.8 — the best

coding model in the world. Their system prompts, parameter choices and context configuration draw

directly on its knowledge. In other words: the world's best coding model prepared local models

that take that work over right on your desk.

License

Apache 2.0 (inherited from the base Gemma 4).

Run Dzluck/gemma4-e4b-claude-coder-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models