khtsly/qwen3.5-9b-claude-4.6-opus-distilled-32k-gguf Q3_K_L GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

khtsly/qwen3.5-9b-claude-4.6-opus-distilled-32k-gguf overview

> Retrain Notice (2026-03-07): This model was retrained from scratch again to address the high final loss observed in the initial QLoRA version. The upgrade to 16-bit LoRA with r=128 and rsLoRA enabled has resulted in a much lower final loss and a more stable, "lossless" transfer of reasoning capabilities.

ggufqwen3_5unslothqwenqwen3.5reasoningchain-of-thoughtloraluaullama.cppvision-language-modelimage-text-to-textenzhdataset:nohurry/Opus-4.6-Reasoning-3000x-filteredbase_model:Qwen/Qwen3.5-9Bbase_model:adapter:Qwen/Qwen3.5-9Blicense:apache-2.0endpoints_compatibleregion:usconversational

khtsly/qwen3.5-9b-claude-4.6-opus-distilled-32k-gguf visual

Downloads

3,441

Likes

Pipeline

image-text-to-text

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

16 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k.BF16-mmproj.gguf	GGUF	BF16	879.01 MB	Download
Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k.Q4_0.gguf	GGUF	—	4.95 GB	Download
Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k.Q4_1.gguf	GGUF	—	5.41 GB	Download
Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k.Q4_K_M.gguf	GGUF	Q4_K_M	5.24 GB	Download
Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k.Q4_K_S.gguf	GGUF	Q4_K_S	4.97 GB	Download
Qwen3.5-9B.BF16-mmproj.gguf	GGUF	BF16	879.01 MB	Download
Qwen3.5-9B.Q2_K.gguf	GGUF	Q2_K	3.39 GB	Download
Qwen3.5-9B.Q3_K_L.gguf	GGUF	Q3_K_L	4.49 GB	Download
Qwen3.5-9B.Q3_K_M.gguf	GGUF	Q3_K_M	4.30 GB	Download
Qwen3.5-9B.Q3_K_S.gguf	GGUF	Q3_K_S	3.97 GB	Download
Qwen3.5-9B.Q5_0.gguf	GGUF	—	5.87 GB	Download
Qwen3.5-9B.Q5_1.gguf	GGUF	—	6.33 GB	Download
Qwen3.5-9B.Q5_K_M.gguf	GGUF	Q5_K_M	6.07 GB	Download
Qwen3.5-9B.Q5_K_S.gguf	GGUF	Q5_K_S	5.87 GB	Download
Qwen3.5-9B.Q6_K.gguf	GGUF	Q6_K	6.85 GB	Download
Qwen3.5-9B.Q8_0.gguf	GGUF	—	8.87 GB	Download

Model Details Live

Model Slug

khtsly/qwen3.5-9b-claude-4.6-opus-distilled-32k-gguf

Author

khtsly

Pipeline Task

image-text-to-text

Library

—

Created

2026-03-04

Last Modified

2026-03-22

Gated

Private

HF SHA

70056ff11f097ea15a0b746fe06cbb82685ce747

License

apache-2.0

Language

en, zh

Base Model

Qwen/Qwen3.5-9B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "language": [
      "en",
      "zh"
    ],
    "license": "apache-2.0",
    "base_model": "Qwen/Qwen3.5-9B",
    "tags": [
      "unsloth",
      "qwen",
      "qwen3.5",
      "reasoning",
      "chain-of-thought",
      "lora",
      "luau",
      "gguf",
      "llama.cpp",
      "vision-language-model"
    ],
    "datasets": [
      "nohurry/Opus-4.6-Reasoning-3000x-filtered"
    ],
    "pipeline_tag": "image-text-to-text",
    "frontmatter": {
      "language": [
        "en",
        "zh"
      ],
      "license": "apache-2.0",
      "base_model": "Qwen/Qwen3.5-9B",
      "tags": [
        "unsloth",
        "qwen",
        "qwen3.5",
        "reasoning",
        "chain-of-thought",
        "lora",
        "luau",
        "gguf",
        "llama.cpp",
        "vision-language-model"
      ],
      "datasets": [
        "nohurry/Opus-4.6-Reasoning-3000x-filtered"
      ],
      "pipeline_tag": "image-text-to-text"
    },
    "hero_image_url": "",
    "summary": "> [!Note] > **Retrain Notice (2026-03-07):** > This model was retrained from scratch again to address the high final loss observed in the initial QLoRA version. The upgrade to **16-bit LoRA** with **r=128** and **rsLoRA** enabled has resulted in a much lower final loss and a more stable, \"lossless\" transfer of reasoning capabilities.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlanguage:\n- en\n- zh\nlicense: apache-2.0\nbase_model: Qwen/Qwen3.5-9B\ntags:\n- unsloth\n- qwen\n- qwen3.5\n- reasoning\n- chain-of-thought\n- lora\n- luau\n- gguf\n- llama.cpp\n- vision-language-model\ndatasets:\n- nohurry/Opus-4.6-Reasoning-3000x-filtered\npipeline_tag: image-text-to-text\n---\n\n# Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k\n\n> [!Note]\n> **Retrain Notice (2026-03-07):** > This model was retrained from scratch again to address the high final loss observed in the initial QLoRA version. The upgrade to **16-bit LoRA** with **r=128** and **rsLoRA** enabled has resulted in a much lower final loss and a more stable, \"lossless\" transfer of reasoning capabilities.\n\n## # Model Introduction\n**Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k** is a highly capable reasoning and coding model fine-tuned on top of the `Qwen3.5-9B` hybrid dense architecture. The model's core directive is to leverage state-of-the-art Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus interactions, with a specialized focus on extended output generation and improved Luau programming capability.\n\nThrough Supervised Fine-Tuning (SFT) focusing on structured reasoning logic and a massive 32k output length max, this model excels in breaking down complex user problems, planning step-by-step methodologies within strictly formatted `<think>` tags, and delivering comprehensive, nuanced solutions—even for highly extensive generation tasks.\n\n### # Benchmark\n| Benchmark | Baseline (9B) | Distilled (9B) |\n| :--- | :---: | :---: |\n| GPQA Diamond (0-shot) | **46.46** | 38.38 |\n| ARC-Challenge (25-shot) | 67.57 | **68.43** |\n| HellaSwag (0-shot) | **76.30** | 76.19 |\n| MMLU Overall (0-shot) | 1.07 | **12.59** |\n| *Humanities* | 2.25 | **21.49** |\n| *Social Sciences* | 0.49 | **7.70** |\n| *STEM* | 0.35 | **6.47** |\n| *Other* | 0.58 | **10.17** |\n| *U.S. History* | 22.06 | **78.43** |\n| *World History* | 16.88 | **71.31** |\n\n*The benchmark is taken in 8-bit & 0.0 temperature using `lm eval`.*\n*Only `GPQA` & `ARC-Challenge` running in quantized 4-bit.*\n*Higher the score is better.*\n\n## # Training Pipeline Overview\n\n```text\nBase Model (Qwen3.5-9B)\n │\n ▼\nSupervised Fine-Tuning (SFT) + LoRA 16-bit (r=128, α=128, rsLoRA)\n(Response-Only Training masked on \"<|im_start|>assistant\\n\")\n(Max 32k Output Length)\n+\nnohurry/Opus-4.6-Reasoning-3000x-filtered + luau coding samples\n(shuffled)\n │\n ▼\nFinal Model (Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k)\n```\n\n### # Supervised Fine-Tuning (SFT) Details\n- **Objective:** To inject high-density reasoning logic, establish a strict internal thinking format prior to output, and train the model to sustain coherent generation over exceptionally long contexts.\n- **Extended Output Capacity:** Trained specifically to handle up to **32,768 (32k) tokens of maximum output** (recommended), allowing for massive codebases, comprehensive essays, and deeply detailed reasoning traces.\n- **LoRA Configuration:** Fine-tuned efficiently using LoRA (16-bit) with both **Rank (r) set to 128** and **Alpha (α) set to 128**, ensuring strong adaptation and retention of complex Opus-level logic.\n- **Rank Scaling (rsLoRA):** Enabled **Rank-Stabilized LoRA**. This uses a specialized scaling factor (1/√r) which allows for the higher rank of 128 to be utilized effectively without exploding gradients, leading to a significantly lower and more stable final loss.\n- **Method:** Utilized **Unsloth** for highly efficient memory and compute optimization. A critical component was the `train_on_responses_only` strategy, masking instructions so the loss is purely calculated over the generation of the `<think>` sequences and the subsequent solutions.\n- **Format Enforcement:** All training samples were systematically normalized so the model strictly abides by the structure `<think> {internal reasoning} </think>\\n {final answer}`.\n\n### # Datasets Used\nThe dataset consists of highly curated, filtered reasoning distillation data, supplemented by specialized coding sets:\n\n| Dataset Name | Description / Purpose |\n|--------------|-----------------------|\n| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | Provides comprehensive, high-quality Claude 4.6 Opus reasoning trajectories. |\n| **Custom Luau Coding Set** | 75 meticulously crafted various Luau coding samples generated natively by Opus 4.6, injecting specialized high-quality domain knowledge for Roblox/Luau scripting capability. |\n\n### # Training Compute & Loss Curve\n* **Hardware:** 1x NVIDIA A100 (80GB)\n* **Training Duration:** ~2 Hour (previously was ~4 Hours)\n* **Estimated Total Cost:** $2.50 (previously was $3.50)\n* **Distillation Efficacy:** The loss curve demonstrated a strong, healthy downward trajectory throughout the run, confirming successful knowledge transfer from the Opus teacher model. The model converged steadily from an initial loss of **0.614357** down to a final loss of **0.222413**.\n\n## # Core Skills & Capabilities\n1. **Massive Output Generation:** Capable of sustaining coherent, high-quality output for up to 32k tokens, making it ideal for writing extensive code, documentation, or deep analytical reports in a single shot.\n2. **Modular & Structured Thinking:** Inheriting traits from Opus-level reasoning, the model confidently parses prompts and outlines plans sequentially in its `<think>` block, avoiding exploratory \"trial-and-error\" self-doubt.\n3. **Luau Proficiency:** Thanks to the targeted 75-sample dataset, the model exhibits improved syntax adherence and logic formulation for the Luau programming language.\n\n## # Limitations & Intended Use\n- **Hallucination Risk:** While reasoning is strong, the model remains an autoregressive LLM. Extended 32k outputs may experience minor drift or hallucinate external facts if relying on real-world verification without grounding.\n- **Intended Scenario:** Best suited for offline analytical tasks, heavy coding (especially Luau), math, and logic-dependent prompting where the user needs transparent internal logic and extremely long, continuous outputs.\n\n## # Acknowledgements\n\nThis model's development was made possible by the foundational tools and contributions from the broader AI ecosystem:\n\n* **[Unsloth AI](https://unsloth.ai/):** For their state-of-the-art framework, enabling highly efficient, memory-optimized LoRA tuning and seamless 32k context scaling.\n* **Qwen Team:** For engineering the robust and highly capable `Qwen3.5-9B` dense base architecture.\n* **Dataset Contributors:** Special recognition to `nohurry` for the rigorous curation of the Claude 4.6 Opus reasoning trajectories, which serves as the core cognitive engine for this project's SFT phase.\n\n-https://ko-fi.com/khtsly",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "qwen3_5",
    "unsloth",
    "qwen",
    "qwen3.5",
    "reasoning",
    "chain-of-thought",
    "lora",
    "luau",
    "llama.cpp",
    "vision-language-model",
    "image-text-to-text",
    "en",
    "zh",
    "dataset:nohurry/Opus-4.6-Reasoning-3000x-filtered",
    "base_model:Qwen/Qwen3.5-9B",
    "base_model:adapter:Qwen/Qwen3.5-9B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 5,
  "downloads": 3441,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-22T20:29:20.000Z",
  "created_at": "2026-03-04T18:12:59.000Z",
  "pipeline_tag": "image-text-to-text",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69a8762be1b0210851dfebba",
  "id": "khtsly/Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k-GGUF",
  "modelId": "khtsly/Qwen3.5-9B-Claude-4.6-Opus-Distilled-32k-GGUF",
  "sha": "70056ff11f097ea15a0b746fe06cbb82685ce747",
  "createdAt": "2026-03-04T18:12:59.000Z",
  "lastModified": "2026-03-22T20:29:20.000Z",
  "author": "khtsly",
  "downloads": 3441,
  "likes": 5,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-text-to-text",
  "library_name": "",
  "siblings_count": 19
}

khtsly/qwen3.5-9b-claude-4.6-opus-distilled-32k-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard