Dzluck/gemma4-e2b-claude-coder-GGUF overview
Gemma 4 Claude Coder — local model family A family of custom models built on Gemma 4 edge variants E2B and E4B , tuned to act as autonomous coding and administ…
Runs locally from ~6.67 GB disk (8 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| gemma4-e2b-claude-coder.Q4_K_M.gguf | GGUF | GGUF | 6.67 GB | Download |
Model Details
| Model ID | Dzluck/gemma4-e2b-claude-coder-GGUF |
|---|---|
| Author | Dzluck |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | google/gemma-3n-e4b |
| Last modified | 2026-06-22T05:17:58.000Z |
Model README
---
license: apache-2.0
base_model: google/gemma-3n-e4b
library_name: gguf
tags:
- gguf
- ollama
- claude-code
- coding
- agent
- function-calling
- gemma
language:
- en
pipeline_tag: text-generation
---
Gemma 4 Claude Coder — local model family
A family of custom models built on Gemma 4 (edge variants E2B and E4B), tuned to act as
autonomous coding and administration agents. The models speak the Anthropic-compatible API,
so they drive Claude Code fully locally — your code never leaves your machine and cloud token
cost drops to zero.
Each model ships with a system prompt focused on real work inside a codebase: use tools instead
of guessing, make minimal and precise code changes, return complete and runnable output, and
verify after acting. Sampling follows Google's official Gemma 4 recommendation
(temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before
a tool call.
The idea
The whole point of this family is to run Claude Code on small, popular, consumer-grade hardware.
No datacenter GPU, no cloud bill — just an everyday Mac Mini (or similar 16 GB machine) acting as a
fully local, agentic coding assistant. These models make that practical: light enough to fit, smart
enough to drive real tool-calling agent loops.
In a time of RAM shortages and the big tech giants tightening usage limits and quotas, owning a
capable agent that runs entirely on your own modest hardware stops being a hobby and becomes
leverage: no rate limits, no surprise pricing, no dependency on someone else's quota.
Models in the family
| Model | Base | Context | Purpose |
|---|---|---|---|
| gemma4-e2b-claude-coder | Gemma 4 E2B (eff. 2B / 5.1B with embeddings) | 64K | Fast everyday coding agent — edits, autocomplete, short agent loops. Lightest on memory. |
| gemma4-e4b-claude-coder | Gemma 4 E4B (eff. 4B / 8B with embeddings) | 64K | Stronger coding agent — better reasoning and tool use on larger tasks. |
| gemma4-e4b-claude-coder-admin | Gemma 4 E4B | 32K | Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput. |
What it's for
- Driving Claude Code locally (
ollama launch claude --model <name>). - Agentic code writing and editing with native function calling / tool use.
- Administration and devops tasks on a server (the admin variant).
- Full privacy and offline operation — no code sent to the cloud.
Context
- Coders (E2B / E4B): 64K tokens — matching Claude Code's recommendation (64K minimum).
- Admin (E4B): 32K tokens — a deliberate trade-off for 16 GB hardware that keeps the model
entirely on the GPU.
- Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware.
Test hardware
The models were built and tested on:
- Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6
- Ollama 0.24, GPU (Metal) inference
Measured performance (16 GB RAM)
| Model | Placement | Speed | Tool calling |
|---|---|---|---|
| gemma4-e2b-claude-coder | 100% GPU | ~55 tok/s | ✅ valid JSON |
| gemma4-e4b-claude-coder (64K) | 39% GPU / 61% CPU | ~27 tok/s (drops under load) | ✅ |
| gemma4-e4b-claude-coder-admin (32K) | 100% GPU | ~30 tok/s (stable) | ✅ |
All three passed an end-to-end test through Claude Code: real turns with tool calls and correct
responses (HTTP 200 on /v1/messages).
How they were made
These models were designed, built and tested with the help of Claude Opus 4.8 — the best
coding model in the world. Their system prompts, parameter choices and context configuration draw
directly on its knowledge. In other words: the world's best coding model prepared local models
that take that work over right on your desk.
License
Apache 2.0 (inherited from the base Gemma 4).
Run Dzluck/gemma4-e2b-claude-coder-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models