khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf Q6_K GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf overview

Comprehensive model page for khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf

ggufqwen3llama.cppunslothreasoningdistillationsfttext-generationendataset:khazarai/kimi-2.5-high-reasoning-250xbase_model:khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilledbase_model:quantized:khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilledlicense:apache-2.0endpoints_compatibleregion:usconversational

khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf visual

Downloads

7,912

Likes

Pipeline

text-generation

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

4 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
qwen3-4b-thinking-2507.BF16.gguf	GGUF	BF16	7.50 GB	Download
qwen3-4b-thinking-2507.Q4_K_M.gguf	GGUF	Q4_K_M	2.33 GB	Download
qwen3-4b-thinking-2507.Q6_K.gguf	GGUF	Q6_K	3.08 GB	Download
qwen3-4b-thinking-2507.Q8_0.gguf	GGUF	—	3.99 GB	Download

Model Details Live

Model Slug

khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf

Author

khazarai

Pipeline Task

text-generation

Library

—

Created

2026-03-16

Last Modified

2026-04-15

Gated

Private

HF SHA

4c65129ebe481d33203d7b7fd65b3c2a930ffad3

License

apache-2.0

Language

Base Model

khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "tags": [
      "gguf",
      "llama.cpp",
      "unsloth",
      "reasoning",
      "distillation",
      "sft"
    ],
    "license": "apache-2.0",
    "datasets": [
      "khazarai/kimi-2.5-high-reasoning-250x"
    ],
    "language": [
      "en"
    ],
    "base_model": [
      "khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled"
    ],
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "tags": [
        "gguf",
        "llama.cpp",
        "unsloth",
        "reasoning",
        "distillation",
        "sft"
      ],
      "license": "apache-2.0",
      "datasets": [
        "khazarai/kimi-2.5-high-reasoning-250x"
      ],
      "language": [
        "en"
      ],
      "base_model": [
        "khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled"
      ],
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "benchmark/Kimi_distilled.png",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\ntags:\n- gguf\n- llama.cpp\n- unsloth\n- reasoning\n- distillation\n- sft\nlicense: apache-2.0\ndatasets:\n- khazarai/kimi-2.5-high-reasoning-250x\nlanguage:\n- en\nbase_model:\n- khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled\npipeline_tag: text-generation\n---\n\n# Qwen3-4B-Kimi2.5-Reasoning-Distilled : GGUF\n\n## Model: khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled\n\n![alt=\"General Benchmark Comparison Chart\"](benchmark/Kimi_distilled.png)\n\n- **Success Rate**: 76.09%\n\n## Model: Qwen/Qwen3-4B-Thinking-2507\n\n![alt=\"General Benchmark Comparison Chart\"](benchmark/BaseModel.png)\n\n- **Success Rate**: 73.73%\n\n- **Benchmark**: khazarai/Multi-Domain-Reasoning-Benchmark\n- **Total Questions**: 100\n\n`Qwen3-4B-Kimi2.5-Reasoning-Distilled` is a fine-tuned language model optimized for structured, long-form reasoning. It is derived from the Qwen3-4b-Thinking-2507 base model and fine-tuned using a specialized distillation dataset generated by Kimi-2.5-thinking.\n\nThis model is designed to bridge the gap between small, efficient models (0.6B–4B range) and the complex reasoning capabilities typically found in much larger models. It excels at breaking down problems, self-correcting, and providing detailed analytical answers.\n\n**Base Model**:\tQwen3-4b-Thinking-2507\n\n**Training Technique**:\tUnsloth + QLoRa\n \n\n## Available Model files:\n- `qwen3-4b-thinking-2507.BF16.gguf`\n- `qwen3-4b-thinking-2507.Q8_0.gguf`\n- `qwen3-4b-thinking-2507.Q6_K.gguf`\n- `qwen3-4b-thinking-2507.Q4_K_M.gguf`\n\n## Ollama\nAn Ollama Modelfile is included for easy deployment.\n \n\n## Provided Quants\n\n(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)\n\n| Type | Size/GB | Notes |\n|:-----|--------:|:------|\n| Q4_K_M | 2.5 | fast, recommended |\n| Q6_K | 3.3 | very good quality |\n| Q8_0 | 4.2 | fast, best quality |\n| f16 | 8.0 | 16 bpw, overkill |\n\nHere is a handy graph by ikawrakow comparing some lower-quality quant\ntypes (lower is better):\n\n![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)\n\nAnd here are Artefact2's thoughts on the matter:\nhttps://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9\n \n\n## Dataset\n\nThe model was fine-tuned on the [khazarai/kimi-2.5-high-reasoning-250x](https://huggingface.co/datasets/khazarai/kimi-2.5-high-reasoning-250x)  \n\nDataset Composition:\n - Total Samples: 250\n - Total Tokens: 1,114,407\n - Teacher Model: Kimi-2.5-Thinking\n\n## Acknowledgements\n\n**Unsloth** for the incredibly fast and memory-efficient training framework.",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "qwen3",
    "llama.cpp",
    "unsloth",
    "reasoning",
    "distillation",
    "sft",
    "text-generation",
    "en",
    "dataset:khazarai/kimi-2.5-high-reasoning-250x",
    "base_model:khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled",
    "base_model:quantized:khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 5,
  "downloads": 7912,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-15T11:26:09.000Z",
  "created_at": "2026-03-16T16:28:00.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69b82f90c8973a232d994b63",
  "id": "khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled-GGUF",
  "modelId": "khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled-GGUF",
  "sha": "4c65129ebe481d33203d7b7fd65b3c2a930ffad3",
  "createdAt": "2026-03-16T16:28:00.000Z",
  "lastModified": "2026-04-15T11:26:09.000Z",
  "author": "khazarai",
  "downloads": 7912,
  "likes": 5,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 10
}

khazarai/qwen3-4b-kimi2.5-reasoning-distilled-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard