GraySoft
Projects Models About FAQ Contact Download guIDE →

chfm/gemma-4-26b-a4b-it-claude-opus-distill-v2-gguf F16 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

chfm/gemma-4-26b-a4b-it-claude-opus-distill-v2-gguf overview

Build Environment & Features: - Fine-tuning Framework: Unsloth - Reasoning Effort: High - This model bridges the gap between Google's exceptional open-weights architecture and Claude 4.6's profound reasoning capabilities, leveraging cutting-edge fine-tuning environments. - v2 fixes some looping or cut off response issues. different training parameters were also used. - This model was able to successfully work inside of Cline, Codex, and Cursor to build funtional web apps and scripts. !Gemma 4 Benchmarks

transformersgguftext-generation-inferenceunslothgemma4reasoningdataset:TeichAI/Claude-Opus-4.6-Reasoning-887xdataset:TeichAI/claude-4.5-opus-high-reasoning-250xdataset:Crownelius/Opus-4.6-Reasoning-2100x-formattedbase_model:TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2base_model:quantized:TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2license:apache-2.0endpoints_compatibleregion:usconversational
chfm/gemma-4-26b-a4b-it-claude-opus-distill-v2-gguf visual
Downloads
1,516
Likes
0
Pipeline
Library
transformers
Visibility
Public
Access
Open

Repository Files & Downloads

12 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
gemma-4-26B-A4B-it-Claude-Opus-Distill.bf16.gguf GGUF BF16 47.04 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.f16.gguf GGUF F16 47.04 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.iq4_nl.gguf GGUF IQ4_NL 13.58 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q3_k_m.gguf GGUF Q3_K_M 12.37 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q3_k_s.gguf GGUF Q3_K_S 11.38 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q4_k_m.gguf GGUF Q4_K_M 15.64 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q5_k_m.gguf GGUF Q5_K_M 17.82 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q6_k.gguf GGUF Q6_K 21.08 GB Download
gemma-4-26B-A4B-it-Claude-Opus-Distill.q8_0.gguf GGUF 25.02 GB Download
mmproj-BF16.gguf GGUF BF16 1.11 GB Download
mmproj-F16.gguf GGUF F16 1.11 GB Download
mmproj-F32.gguf GGUF F32 2.13 GB Download

Model Details Live

Model Slug
chfm/gemma-4-26b-a4b-it-claude-opus-distill-v2-gguf
Author
chfm
Pipeline Task
Library
transformers
Created
2026-04-14
Last Modified
2026-04-14
Gated
No
Private
No
HF SHA
17833113d38ba6e2eb9cffaebeee6c617e5e7b78
License
apache-2.0
Language
Unknown
Base Model
TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "base_model": "TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2",
    "tags": [
      "text-generation-inference",
      "transformers",
      "unsloth",
      "gemma4",
      "reasoning"
    ],
    "license": "apache-2.0",
    "datasets": [
      "TeichAI/Claude-Opus-4.6-Reasoning-887x",
      "TeichAI/claude-4.5-opus-high-reasoning-250x",
      "Crownelius/Opus-4.6-Reasoning-2100x-formatted"
    ],
    "frontmatter": {
      "base_model": "TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2",
      "tags": [
        "text-generation-inference",
        "transformers",
        "unsloth",
        "gemma4",
        "reasoning"
      ],
      "license": "apache-2.0",
      "datasets": [
        "TeichAI/Claude-Opus-4.6-Reasoning-887x",
        "TeichAI/claude-4.5-opus-high-reasoning-250x",
        "Crownelius/Opus-4.6-Reasoning-2100x-formatted"
      ]
    },
    "hero_image_url": "https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/gemma-4-table_light_Web_with_Arena.jpg",
    "summary": "> **Build Environment & Features:** > - **Fine-tuning Framework**: **Unsloth** > - **Reasoning Effort**: **High** > - This model bridges the gap between Google's exceptional open-weights architecture and Claude 4.6's profound reasoning capabilities, leveraging cutting-edge fine-tuning environments. > - v2 fixes some looping or cut off response issues. different training parameters were also used. > - This model was able to successfully work inside of Cline, Codex, and Cursor to build funtional web apps and scripts. !Gemma 4 Benchmarks",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model: TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2\ntags:\n- text-generation-inference\n- transformers\n- unsloth\n- gemma4\n- reasoning\nlicense: apache-2.0\ndatasets:\n- TeichAI/Claude-Opus-4.6-Reasoning-887x\n- TeichAI/claude-4.5-opus-high-reasoning-250x\n- Crownelius/Opus-4.6-Reasoning-2100x-formatted\n---\n\n# 🌟 Gemma 4 - 26B A4B x Claude Opus 4.6 (v2)\n\n> **Build Environment & Features:**\n> - **Fine-tuning Framework**: **Unsloth**\n> - **Reasoning Effort**: **High**\n> - This model bridges the gap between Google's exceptional open-weights architecture and Claude 4.6's profound reasoning capabilities, leveraging cutting-edge fine-tuning environments.\n> - v2 fixes some looping or cut off response issues. different training parameters were also used.\n> - This model was able to successfully work inside of Cline, Codex, and Cursor to build funtional web apps and scripts.\n\n![Gemma 4 Benchmarks](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/gemma-4-table_light_Web_with_Arena.jpg)\n\n## 💡 Model Introduction\n**Gemma 4 - 26B A4B x Claude Opus 4.6** is a highly capable model fine-tuned on top of the powerful `unsloth/gemma-4-26B-A4B-it` architecture. The model's core directive is to absorb state-of-the-art reasoning distillation, primarily sourced from Claude-4.6 Opus interactions. \n\nBy utilizing datasets where the reasoning effort was explicitly set to **High**, this model excels in breaking down complex problems and delivering precise, nuanced solutions across a variety of demanding domains.\n\n## 🗺️ Training Pipeline Overview\n\n```text\nBase Model (unsloth/gemma-4-26B-A4B-it)\n │\n ▼\nSupervised Fine-Tuning (SFT) + High-Effort Reasoning Datasets\n │\n ▼\nFinal Model (Gemma 4 - 26B A4B x Claude Opus 4.6)\n````\n\n## 📋 Stage Details & Benchmarks\n\n*Benchmarks coming soon*\n\n**Performance vs Size:**\n\n> **Deep Dive Analysis:** For more comprehensive insights regarding the base capabilities of the Gemma 4 architecture, please refer to [this Analysis Document](https://huggingface.co/TeichAI/gemma-4-31B-it-Claude-Opus-Distill/resolve/main/Gemma%204%20Analysis.pdf).\n\n### 🔹 Supervised Fine-Tuning (Meeting Claude)\n\n  - **Objective:** To inject high-density reasoning logic and establish a strict format for complex problem-solving.\n  - **Methodology:** We utilized **Unsloth** for highly efficient memory and compute optimization during the fine-tuning process. The model was trained extensively on various reasoning trajectories from Claude Opus 4.6 to adopt a structured and efficient thinking pattern.\n\n### 📚 All Datasets Used\n\nThe dataset consists of high-quality, high-effort reasoning distillation data:\n\n| Dataset Name | Description / Purpose |\n|--------------|-----------------------|\n| `TeichAI/Claude-Opus-4.6-Reasoning-887x` | Core Claude 4.6 Opus reasoning trajectories. |\n| `TeichAI/claude-4.5-opus-high-reasoning-250x` | Legacy high-intensity reasoning distillation. |\n| `Crownelius/Opus-4.6-Reasoning-2100x-formatted` | Crownelius's extensively formatted Opus reasoning dataset for structural reinforcement. |\n\n## 🌟 Core Skills & Capabilities\n\nThanks to its robust base model and high-effort reasoning distillation, this model is highly optimized for the following use cases:\n\n1.  **💻 Coding:** Advanced code generation, debugging, and software architecture planning.\n2.  **🔬 Science:** Deep scientific reasoning, hypothesis evaluation, and analytical problem-solving.\n3.  **🔎 Deep Research:** Navigating complex, multi-step research queries and synthesizing vast amounts of information.\n4.  **🧠 General Purpose:** Highly capable instruction-following for everyday tasks requiring high logical coherence.\n\n## Getting Started\n\nYou can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment:\n\n`pip install -U transformers torch accelerate`\n\nOnce you have everything installed, you can proceed to load the model with the code below:\n\n```python\nfrom transformers import AutoProcessor, AutoModelForCausalLM\n\nMODEL_ID = \"google/gemma-4-31B-it\"\n\n# Load model\nprocessor = AutoProcessor.from_pretrained(MODEL_ID)\nmodel = AutoModelForCausalLM.from_pretrained(\n    MODEL_ID,\n    dtype=\"auto\",\n    device_map=\"auto\"\n)\n```\n\nOnce the model is loaded, you can start generating output:\n\n```python\n# Prompt\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"Write a short joke about saving RAM.\"},\n]\n\n# Process input\ntext = processor.apply_chat_template(\n    messages, \n    tokenize=False, \n    add_generation_prompt=True, \n    enable_thinking=False\n)\ninputs = processor(text=text, return_tensors=\"pt\").to(model.device)\ninput_len = inputs[\"input_ids\"].shape[-1]\n\n# Generate output\noutputs = model.generate(**inputs, max_new_tokens=1024)\nresponse = processor.decode(outputs[0][input_len:], skip_special_tokens=False)\n\n# Parse output\nprocessor.parse_response(response)\n```\n\nTo enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output.\n\nBelow, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text:\n\n<details>\n<summary>Code for processing Audio</summary>\n\nInstead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process audio. To use it, make sure to install the following packages:\n\n\n`pip install -U transformers torch librosa accelerate`\n\nYou can then load the model with the code below:\n\n```python\nfrom transformers import AutoProcessor, AutoModelForMultimodalLM\n\nMODEL_ID = \"google/gemma-4-E2B-it\"\n\n# Load model\nprocessor = AutoProcessor.from_pretrained(MODEL_ID)\nmodel = AutoModelForMultimodalLM.from_pretrained(\n    MODEL_ID, \n    dtype=\"auto\", \n    device_map=\"auto\"\n)\n```\n\nOnce the model is loaded, you can start generating output by directly referencing the audio URL in the prompt:\n\n\n```python\n# Prompt - add audio before text\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\"type\": \"audio\", \"audio\": \"https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav\"},\n            {\"type\": \"text\", \"text\": \"Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\\n* Only output the transcription, with no newlines.\\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.\"},\n        ]\n    }\n]\n\n# Process input\ninputs = processor.apply_chat_template(\n    messages,\n    tokenize=True,\n    return_dict=True,\n    return_tensors=\"pt\",\n    add_generation_prompt=True,\n).to(model.device)\ninput_len = inputs[\"input_ids\"].shape[-1]\n\n# Generate output\noutputs = model.generate(**inputs, max_new_tokens=512)\nresponse = processor.decode(outputs[0][input_len:], skip_special_tokens=False)\n\n# Parse output\nprocessor.parse_response(response)\n```\n\n</details>\n\n<details>\n<summary>Code for processing Images</summary>\n\nInstead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process images. To use it, make sure to install the following packages:\n\n\n`pip install -U transformers torch torchvision accelerate`\n\nYou can then load the model with the code below:\n\n```python\nfrom transformers import AutoProcessor, AutoModelForMultimodalLM\n\nMODEL_ID = \"google/gemma-4-31B-it\"\n\n# Load model\nprocessor = AutoProcessor.from_pretrained(MODEL_ID)\nmodel = AutoModelForMultimodalLM.from_pretrained(\n    MODEL_ID, \n    dtype=\"auto\", \n    device_map=\"auto\"\n)\n```\n\nOnce the model is loaded, you can start generating output by directly referencing the image URL in the prompt:\n\n\n```python\n# Prompt - add image before text\nmessages = [\n    {\n        \"role\": \"user\", \"content\": [\n            {\"type\": \"image\", \"url\": \"https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png\"},\n            {\"type\": \"text\", \"text\": \"What is shown in this image?\"}\n        ]\n    }\n]\n\n# Process input\ninputs = processor.apply_chat_template(\n    messages,\n    tokenize=True,\n    return_dict=True,\n    return_tensors=\"pt\",\n    add_generation_prompt=True,\n).to(model.device)\ninput_len = inputs[\"input_ids\"].shape[-1]\n\n# Generate output\noutputs = model.generate(**inputs, max_new_tokens=512)\nresponse = processor.decode(outputs[0][input_len:], skip_special_tokens=False)\n\n# Parse output\nprocessor.parse_response(response)\n```\n\n</details>\n\n\n<details>\n<summary>Code for processing Videos</summary>\n\nInstead of using `AutoModelForCausalLM`, you can use `AutoModelForMultimodalLM` to process videos. To use it, make sure to install the following packages:\n\n`pip install -U transformers torch torchvision torchcodec librosa accelerate`\n\nYou can then load the model with the code below:\n\n```python\nfrom transformers import AutoProcessor, AutoModelForMultimodalLM\n\nMODEL_ID = \"google/gemma-4-31B-it\"\n\n# Load model\nprocessor = AutoProcessor.from_pretrained(MODEL_ID)\nmodel = AutoModelForMultimodalLM.from_pretrained(\n    MODEL_ID, \n    dtype=\"auto\", \n    device_map=\"auto\"\n)\n```\n\nOnce the model is loaded, you can start generating output by directly referencing the video URL in the prompt:\n\n\n```python\n# Prompt - add video before text\nmessages = [\n    {\n        'role': 'user',\n        'content': [\n            {\"type\": \"video\", \"video\": \"https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4\"},\n            {'type': 'text', 'text': 'Describe this video.'}\n        ]\n    }\n]\n\n# Process input\ninputs = processor.apply_chat_template(\n    messages,\n    tokenize=True,\n    return_dict=True,\n    return_tensors=\"pt\",\n    add_generation_prompt=True,\n).to(model.device)\ninput_len = inputs[\"input_ids\"].shape[-1]\n\n# Generate output\noutputs = model.generate(**inputs, max_new_tokens=512)\nresponse = processor.decode(outputs[0][input_len:], skip_special_tokens=False)\n\n# Parse output\nprocessor.parse_response(response)\n```\n\n</details>\n\n## **Best Practices**\n\nFor the best performance, use these configurations and best practices:\n\n### 1. Sampling Parameters\n\nUse the following standardized sampling configuration across all use cases:\n\n* `temperature=1.0`  \n* `top_p=0.95`  \n* `top_k=64`\n\n### 2. Thinking Mode Configuration\n\nCompared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens:\n\n* **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token.   \n* **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure:  \n  `<|channel>thought\\n`**[Internal reasoning]**`<channel|>`  \n* **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block:  \n  `<|channel>thought\\n<channel|>`**[Final answer]**\n\n> [!Note]\n> Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you.\n\n### 3. Multi-Turn Conversations\n\n* **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must *not be added* before the next user turn begins.\n\n### 4. Modality order\n\n* For optimal performance with multimodal inputs, place image and/or audio content **before** the text in your prompt. \n\n### 5. Variable Image Resolution\n\nAside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding.\n\n* The supported token budgets are: **70**, **140**, **280**, **560**, and **1120**.  \n  * Use *lower budgets* for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail.   \n  * Use *higher budgets* for tasks like OCR, document parsing, or reading small text.\n\n### 6. Audio\n\nUse the following prompt structures for audio processing:\n\n* **Audio Speech Recognition (ASR)**\n\n```text\nTranscribe the following speech segment in {LANGUAGE} into {LANGUAGE} text.\n\nFollow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.\n```\n\n* **Automatic Speech Translation (AST)**\n\n```text\nTranscribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}.\nWhen formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}.\n```\n\n### 7. Audio and Video Length\n\nAll models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second.\n\n## 🙏 Acknowledgements\n\n  - **Google**: For providing an exceptional open weights model. Read more about Gemma 4 on the [Google Innovation Blog](https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/).\n  - **Unsloth**: For assembling ready-to-use, cutting-edge fine-tuning environments that make this work possible.\n  - **Crownelius**: For creating and sharing his awesome Opus reasoning dataset with the community.\n\n\n## 📖 Citation\n\nIf you use this model in your research or projects, please cite:\n\n```bibtex\n@misc{teichai_gemma4_26b_a4b_opus_distilled_v2,\n  title        = {Gemma-4-26B-A4B-it-Claude-Opus-Distill-v2},\n  author       = {TeichAI},\n  year         = {2026},\n  publisher    = {Hugging Face},\n  howpublished = {\\url{https://huggingface.co/TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2}}\n}\n```\n",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "text-generation-inference",
    "unsloth",
    "gemma4",
    "reasoning",
    "dataset:TeichAI/Claude-Opus-4.6-Reasoning-887x",
    "dataset:TeichAI/claude-4.5-opus-high-reasoning-250x",
    "dataset:Crownelius/Opus-4.6-Reasoning-2100x-formatted",
    "base_model:TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2",
    "base_model:quantized:TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 1516,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-14T09:26:36.000Z",
  "created_at": "2026-04-14T09:26:36.000Z",
  "pipeline_tag": "",
  "library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69de084cd63573aee7cfc0ad",
  "id": "chfm/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2-GGUF",
  "modelId": "chfm/gemma-4-26B-A4B-it-Claude-Opus-Distill-v2-GGUF",
  "sha": "17833113d38ba6e2eb9cffaebeee6c617e5e7b78",
  "createdAt": "2026-04-14T09:26:36.000Z",
  "lastModified": "2026-04-14T09:26:36.000Z",
  "author": "chfm",
  "downloads": 1516,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "transformers",
  "siblings_count": 14
}