GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE โ†’
Model Intelligence Sheet

llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-GGUF overview

<div style="background color: ff4444; color: white; padding: 20px; border radius: 10px; text align: center; margin: 20px 0;" <h2 style="color: white; margin: 0โ€ฆ

transformersggufgemma4codingcodereasoningthinkingsafetensorshereticuncensoreddecensoredabliteratedtext-generationbase_model:llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-hereticbase_model:quantized:llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-hereticlicense:apache-2.0endpoints_compatibleregion:usconversational

Runs locally from ~116.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
4
Pipeline
text-generation
Author

Repository Files & Downloads

8 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-F16.ggufGGUFF1622.20 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q4_K_M.ggufGGUFQ4_K_M6.87 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q4_K_S.ggufGGUFQ4_K_S6.54 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q5_K_M.ggufGGUFQ5_K_M7.96 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q5_K_S.ggufGGUFQ5_K_S7.77 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q6_K.ggufGGUFQ6_K9.11 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-Q8_0.ggufGGUFQ8_011.80 GBDownload
gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-mmproj-F16.ggufGGUFF16116.4 MBDownload

Model Details

Model IDllmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-GGUF
Authorllmfan46
Pipelinetext-generation
Licenseapache-2.0
Base modelllmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic
Last modified2026-06-21T23:54:52.000Z

Model README

---

license: apache-2.0

base_model:

  • llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic

library_name: transformers

pipeline_tag: text-generation

tags:

  • gemma4
  • coding
  • code
  • reasoning
  • thinking
  • safetensors
  • transformers
  • heretic
  • uncensored
  • decensored
  • abliterated

---

<div style="background-color: #ff4444; color: white; padding: 20px; border-radius: 10px; text-align: center; margin: 20px 0;">

<h2 style="color: white; margin: 0 0 10px 0;">๐Ÿšจโš ๏ธ I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT โš ๏ธ๐Ÿšจ</h2>

<p style="font-size: 18px; margin: 0 0 15px 0;">I can no longer upload new models unless I can cover the cost of additional storage.<br>I host <b>70+ free models</b> as an independent contributor and this work is unpaid.<br><b>Without your support, no more new models can be uploaded.</b></p>

<p style="font-size: 20px; margin: 0;">

<a href="https://patreon.com/LLMfan46" style="color: white; text-decoration: underline;">๐ŸŽ‰ Patreon (Monthly)</a> &nbsp;|&nbsp;

<a href="https://ko-fi.com/llmfan46" style="color: white; text-decoration: underline;">โ˜• Ko-fi (One-time)</a>

</p>

<p style="font-size: 16px; margin: 10px 0 0 0;">Every contribution goes directly toward Hugging Face storage fees to keep models free for everyone.</p>

</div>

---

91% fewer refusals (9/100 Uncensored vs 100/100 Original) while preserving model quality (0.0467 KL divergence).

โค๏ธ Support My Work

Creating these models takes significant time, work and compute. If you find them useful consider supporting me:

!image/png

| Platform | Link | What you get |

|----------|------|--------------|

| ๐ŸŽ‰ Patreon | Monthly support | Priority model requests |

| โ˜• Ko-fi | One-time tip | My eternal gratitude |

Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.

-----

GGUF quantizations of llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic.

This is a decensored version of yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF, made using Heretic v1.4.0 with a variant of the Magnitude-Preserving Orthogonal Ablation (MPOA) method

Abliteration parameters

| Parameter | Value |

| :-------- | :---: |

| direction_index | 28.81 |

| attn.o_proj.max_weight | 0.93 |

| attn.o_proj.max_weight_position | 28.53 |

| attn.o_proj.min_weight | 0.87 |

| attn.o_proj.min_weight_distance | 25.58 |

| mlp.down_proj.max_weight | 1.37 |

| mlp.down_proj.max_weight_position | 33.47 |

| mlp.down_proj.min_weight | 0.28 |

| mlp.down_proj.min_weight_distance | 26.41 |

Targeted components

* attn.o_proj

* mlp.down_proj

Performance

| Metric | This model | Original model (gemma-4-12B-coder-fable5-composer2.5-v1-GGUF) |

| :----- | :--------: | :---------------------------: |

| KL divergence | <span style="color:darkgoldenrod">0.0467</span> | 0 (by definition) |

| Refusals | โœ… <span style="color:darkgreen">9/100</span> | โŒ <span style="color:blue">100/100</span> |

MMLU test results:

<span style="color:blue">Original:</span>

============================================================

  • Total questions: 7021
  • Correct: 5316
  • Accuracy: 0.7572 (75.72%)
  • Parse failures: 90

============================================================

Tested subject scores:

  • professional_law: 0.5987 (470/785)
  • moral_scenarios: 0.6606 (292/442)
  • miscellaneous: 0.8642 (331/383)
  • professional_psychology: 0.8070 (255/316)
  • high_school_psychology: 0.9111 (246/270)
  • high_school_macroeconomics: 0.8325 (164/197)
  • elementary_mathematics: 0.8152 (150/184)
  • moral_disputes: 0.7874 (137/174)
  • prehistory: 0.8488 (146/172)
  • philosophy: 0.7862 (125/159)
  • high_school_biology: 0.9211 (140/152)
  • professional_accounting: 0.6434 (92/143)
  • clinical_knowledge: 0.8286 (116/140)
  • high_school_microeconomics: 0.8750 (119/136)
  • nutrition: 0.7926 (107/135)
  • professional_medicine: 0.7910 (106/134)
  • conceptual_physics: 0.8125 (104/128)
  • high_school_mathematics: 0.3622 (46/127)
  • human_aging: 0.7155 (83/116)
  • security_studies: 0.7857 (88/112)
  • high_school_statistics: 0.7027 (78/111)
  • marketing: 0.9266 (101/109)
  • high_school_world_history: 0.8774 (93/106)
  • sociology: 0.8641 (89/103)
  • high_school_government_and_politics: 0.9109 (92/101)
  • high_school_geography: 0.8990 (89/99)
  • high_school_chemistry: 0.7216 (70/97)
  • high_school_us_history: 0.8526 (81/95)
  • virology: 0.4944 (44/89)
  • college_medicine: 0.8068 (71/88)
  • world_religions: 0.8295 (73/88)
  • high_school_physics: 0.5952 (50/84)
  • electrical_engineering: 0.6667 (54/81)
  • astronomy: 0.8481 (67/79)
  • logical_fallacies: 0.8158 (62/76)
  • high_school_european_history: 0.8219 (60/73)
  • anatomy: 0.7465 (53/71)
  • college_biology: 0.8750 (56/64)
  • human_sexuality: 0.8281 (53/64)
  • formal_logic: 0.6094 (39/64)
  • public_relations: 0.7213 (44/61)
  • international_law: 0.8833 (53/60)
  • college_physics: 0.4912 (28/57)
  • college_mathematics: 0.4364 (24/55)
  • econometrics: 0.7037 (38/54)
  • jurisprudence: 0.7736 (41/53)
  • high_school_computer_science: 0.9038 (47/52)
  • machine_learning: 0.7308 (38/52)
  • medical_genetics: 0.8824 (45/51)
  • global_facts: 0.5098 (26/51)
  • management: 0.9400 (47/50)
  • us_foreign_policy: 0.9400 (47/50)
  • college_chemistry: 0.4255 (20/47)
  • abstract_algebra: 0.5532 (26/47)
  • business_ethics: 0.7174 (33/46)
  • college_computer_science: 0.7333 (33/45)
  • computer_security: 0.7907 (34/43)

<span style="color:darkgreen">Heretic:</span>

============================================================

  • Total questions: 7021
  • Correct: 5276
  • Accuracy: 0.7515 (75.15%)
  • Parse failures: 97

============================================================

Tested subject scores:

  • professional_law: 0.5847 (459/785)
  • moral_scenarios: 0.6335 (280/442)
  • miscellaneous: 0.8642 (331/383)
  • professional_psychology: 0.7975 (252/316)
  • high_school_psychology: 0.9148 (247/270)
  • high_school_macroeconomics: 0.8274 (163/197)
  • elementary_mathematics: 0.8152 (150/184)
  • moral_disputes: 0.7931 (138/174)
  • prehistory: 0.8547 (147/172)
  • philosophy: 0.7799 (124/159)
  • high_school_biology: 0.9079 (138/152)
  • professional_accounting: 0.6294 (90/143)
  • clinical_knowledge: 0.8143 (114/140)
  • high_school_microeconomics: 0.8676 (118/136)
  • nutrition: 0.8000 (108/135)
  • professional_medicine: 0.7537 (101/134)
  • conceptual_physics: 0.7891 (101/128)
  • high_school_mathematics: 0.3622 (46/127)
  • human_aging: 0.7241 (84/116)
  • security_studies: 0.8125 (91/112)
  • high_school_statistics: 0.6847 (76/111)
  • marketing: 0.9174 (100/109)
  • high_school_world_history: 0.8774 (93/106)
  • sociology: 0.8835 (91/103)
  • high_school_government_and_politics: 0.9109 (92/101)
  • high_school_geography: 0.8889 (88/99)
  • high_school_chemistry: 0.7216 (70/97)
  • high_school_us_history: 0.8421 (80/95)
  • virology: 0.4607 (41/89)
  • college_medicine: 0.7841 (69/88)
  • world_religions: 0.8182 (72/88)
  • high_school_physics: 0.5595 (47/84)
  • electrical_engineering: 0.6543 (53/81)
  • astronomy: 0.8734 (69/79)
  • logical_fallacies: 0.8553 (65/76)
  • high_school_european_history: 0.8082 (59/73)
  • anatomy: 0.7606 (54/71)
  • college_biology: 0.8906 (57/64)
  • human_sexuality: 0.8125 (52/64)
  • formal_logic: 0.5938 (38/64)
  • public_relations: 0.6721 (41/61)
  • international_law: 0.9000 (54/60)
  • college_physics: 0.5263 (30/57)
  • college_mathematics: 0.4182 (23/55)
  • econometrics: 0.7037 (38/54)
  • jurisprudence: 0.7736 (41/53)
  • high_school_computer_science: 0.9038 (47/52)
  • machine_learning: 0.7115 (37/52)
  • medical_genetics: 0.8627 (44/51)
  • global_facts: 0.5686 (29/51)
  • management: 0.9200 (46/50)
  • us_foreign_policy: 0.9600 (48/50)
  • college_chemistry: 0.3830 (18/47)
  • abstract_algebra: 0.6170 (29/47)
  • business_ethics: 0.7609 (35/46)
  • college_computer_science: 0.7556 (34/45)
  • computer_security: 0.7907 (34/43)

MMLU - Massive Multitask Language Understanding, multiple-choice questions across 57 subjects (math, history, law, medicine, etc.).

-----

๐Ÿ’ป Gemma4-12B-Coder (GGUF) โ€” Composer 2.5 ร— Fable 5 โœจ

๐Ÿฃ Tiny footprint, big brain โ€” a local coding model for everyone

> No matter your GPU. No matter your RAM. If you've got ~4.5 GB of VRAM or unified memory free,

> you can run your own private, offline coding assistant right now. ๐Ÿš€

> This is the v1 / code edition โ€” distilled from real chain-of-thought so it thinks through a problem

> before writing the solution. ๐Ÿง ๐Ÿ’ป All local, all yours, no API, no cloud.

๐ŸŽฏ What it is

A focused fine-tune of Gemma 4 12B on verifiable Python coding data โ€” every training example's reasoning leads to

code that actually passed its tests. The result reasons in the open (edge cases, complexity, approach) and then

emits a clean, runnable solution. ๐Ÿ’š

---

๐Ÿ“Œ Announcements

๐Ÿš€๐Ÿ”ฅ IT'S HERE โ€” v2 is OUT NOW! v2 has shipped โ€” the GGUF quants are live and ready to run โ†’

grab v2 here. ๐ŸŽ‰

The full safetensors master (build / fine-tune on top) goes up tomorrow. v2 is agentic + coding focused โ€”

the piece v1 was missing.

Here's the result that got me most excited. When I saw v2's tau2-bench telecom result โ€” an agentic tool-use

benchmark where the model has to diagnose โ†’ fix โ†’ verify, exactly like real terminal/debugging work โ€” I literally got

launched out of my chair (โ€ฆokay, kidding ๐Ÿ˜„). The jump in actually solving the problem is wild:

| tau2-bench telecom ยท local, same harness, Q8_0 | score |

|---|---|

| official gemma-4-12B-it (base) | ~15% |

| ๐ŸŸข v2 (this release) | ~55% |

The base model tends to give up early (hands the problem off to a human); v2 keeps going and works it the way a

much bigger model would. Full benchmark details are in the v2 card now. ๐Ÿ”ง

โœ… safetensors master (this v1 model) is UP. Full-precision weights are live โ†’

yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1

โ€” roll your own GGUF / MLX / AWQ quants or fine-tune straight from the master. ๐ŸŽ‰

---

๐Ÿ“ฃ Context length fixed: now 256K (was 131K) โ€” thanks, community! ๐Ÿ’š

A community member spotted that this model was reporting only a 131K context window. That turned out to be

the well-known upstream Gemma 4 metadata bug โ€” Google's initial config.json shipped with

max_position_embeddings: 131072 instead of the real 262144 (256K), and that value got baked into a lot of

downstream finetunes and quants (including this one) before it was fixed upstream.

The weights were always fine โ€” it was purely a metadata field. **All GGUF quants have been re-patched to the

full 256K context** (gemma4.context_length = 262144). Just re-download if you grabbed an earlier copy. ๐Ÿ™

---

๐Ÿ“š Training data (the interesting part ๐Ÿณ)

This is a distillation of two complementary chain-of-thought sources, both over verifiable Python coding tasks

(algorithmic / function-level problems that come with deterministic tests):

  • *๐Ÿฅ‡ Main set โ€” Composer 2.5 real CoT.* Genuine, model-authored reasoning traces. The teacher solved each problem,

its code was run against the task's tests, and only the passing solutions were kept. So the reasoning you're

learning from leads to code that actually works.

  • ๐Ÿฅˆ Aux set โ€” Fable 5 (released today! ๐ŸŽ‰). A clever twist: we took the problems where Composer 2.5 got it wrong

and handed them to Fable 5 to redo โ€” re-deriving a fresh, self-consistent chain-of-thought and a correct

solution, again gated on passing the tests. This recovers the hard cases the main teacher missed. These traces

are synthetic (rationalized CoT), and are tagged separately so the two sources stay distinguishable.

The recipe: real CoT for the bulk of solid coverage, plus synthetic "second-attempt" CoT to patch the failures โ€”

both verified by execution before anything entered training. โœ…

---

๐Ÿ“ฆ Pick your size (GGUF quants)

| Quant | Size | Vibe |

|------|------|------|

| ๐ŸŸข Q2_K | 4.5 GB | tiniest โ€” runs almost anywhere |

| ๐ŸŸก Q3_K_M | 5.7 GB | great for 8 GB VRAM โ€” much better than Q2 |

| ๐Ÿ”ต Q4_K_M | 6.87 GB | the sweet spot ๐Ÿ‘Œ (recommended) |

| ๐ŸŸฃ Q6_K | 9.11 GB | near-lossless |

| โšช Q8_0 | 11.8 GB | basically full quality |

---

๐Ÿงฎ "Will it fit?" โ€” context length cheat-sheet

Rough estimates ๐Ÿค“ (assumes q8_0 KV cache + ~1.5 GB overhead; use q4_0 KV cache for โ‰ˆ2ร— more context!).

Max context is 256K. "โ€”" = won't fit, pick a smaller quant. โœ‚๏ธ

| Your VRAM / unified mem | ๐ŸŸข Q2_K (4.5G) | ๐ŸŸก Q3_K_M (5.7G) | ๐Ÿ”ต Q4_K_M (6.87G) | ๐ŸŸฃ Q6_K (9.11G) | โšช Q8_0 (11.8G) |

|---|---|---|---|---|---|

| 8 GB | ~16K ctx | ~10K | tight (~2โ€“4K) | โ€” | โ€” |

| 12 GB | ~48K | ~38K | ~30K | ~12K | โ€” |

| 16 GB | ~80K | ~72K | ~64K | ~44K | ~22K |

| 24 GB | ~200K | ~160K | ~128K | ~110K | ~88K |

| 32 GB | 256K (max) ๐ŸŽ‰ | 256K | 256K | ~230K | ~190K |

> ๐Ÿ’ก Apple Silicon / integrated GPUs with unified memory count too โ€” same numbers, just slower than a dGPU.

> ๐Ÿ’ก Low on room? Drop a quant or switch KV cache to q4_0 and your context roughly doubles.

---

๐Ÿš€ How to run it (super easy)

Option A โ€” llama.cpp (recommended) ๐Ÿฆ™

  1. Grab a quant above (e.g. โ€ฆ-Q4_K_M.gguf) and llama-server from llama.cpp.

> โš ๏ธ Needs a recent llama.cpp (this is the gemma4_unified architecture โ€” older builds won't load it).

  1. Run a server (Windows .bat shown โ€” tweak --port, --ctx-size to taste):
@echo off
cd /d C:\llama.cpp
llama-server.exe ^
  -m C:\models\gemma4-coding-Q4_K_M.gguf ^
  --ctx-size 16384 ^
  --n-gpu-layers 99 ^
  --no-mmap ^
  -fa on ^
  --cache-type-k q8_0 --cache-type-v q8_0 ^
  --temp 1.0 --top-p 0.95 --top-k 64 ^
  --host 0.0.0.0 --port 18080
pause
  1. Open http://localhost:18080 and chat. ๐ŸŽ‰ (Tip: bump --ctx-size per the table; use q4_0 KV for more.)

Option B โ€” one-click apps ๐Ÿ–ฑ๏ธ

Works in LM Studio, Jan, Ollama, etc. โ€” just import the GGUF, pick your quant, go. ๐Ÿพ

๐Ÿง  Thinking mode

This model thinks in Gemma's native thought channel before answering โ€” exactly how it was trained. Keep

enable_thinking=true (the default chat template handles it). Recommended sampling: temp 1.0, top_p 0.95, top_k 64.

For coding you can also go greedy (temp 0) for more deterministic solutions.

---

โš ๏ธ Good to know

  • Reduced refusals: the training data is task-focused with no safety hedging, so this refuses less than the base

model. It is not safety-aligned โ€” add your own guardrails for production. Use responsibly. ๐Ÿ™

  • Specialized for Python / algorithmic coding. Reasoning quality is strongest in that domain; general-knowledge

facts/numbers should still be double-checked.

  • English-centric.

---

๐Ÿ“š Base & License

  • License: Apache 2.0. Gemma 4 is released by Google under

Apache 2.0 (unlike the older Gemma 1/2/3 terms), so this fine-tune is

Apache 2.0 too โ€” free to use, modify, and redistribute. ๐ŸŽ‰

  • Base model: google/gemma-4-12B-it.
  • Personal/hobby project โ€” shared as-is, no warranty. Have fun, and happy hacking! ๐Ÿพโœจ

Run llmfan46/gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-GGUF with guIDE

Download guIDE โ€” the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE โ†’ ยท Browse 524k+ models ยท Compare models

Source: Hugging Face ยท Compare models