GraySoft
Projects Models About FAQ Contact Download guIDE โ†’

ademola265/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf Q3_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

ademola265/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf overview

Qwen3-4B-Thinking-2507-GLM-4.7-Distilled is a fine-tuned model built upon the GRPO-optimized Jackrong/DASD-4B-Thinking-2507-GRPO-v2 (originally based on Qwen/Qwen3-4B-Thinking-2507). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and structured Chain-of-Thought (CoT) execution. ๐ŸŽฏ Core Improvement: The primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting ("think-as-you-go") stream of consciousness, this model has learned to adopt a highly confident, "Plan-then-Execute" paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation. ---

ggufqwen3unslothtext-generationreasoningmathgrposftdistillationconversationalglm-4.7enzhbase_model:Jackrong/DASD-4B-Thinking-2507-GRPO-v2base_model:quantized:Jackrong/DASD-4B-Thinking-2507-GRPO-v2license:apache-2.0endpoints_compatibleregion:us
ademola265/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf visual
Downloads
110
Likes
0
Pipeline
text-generation
Library
โ€”
Visibility
Public
Access
Open

Repository Files & Downloads

12 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-BF16.gguf GGUF BF16 7.50 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-IQ4_XS.gguf GGUF IQ4_XS 2.13 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q2_K.gguf GGUF Q2_K 1.55 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_L.gguf GGUF Q3_K_L 2.09 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_M.gguf GGUF Q3_K_M 1.93 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_S.gguf GGUF Q3_K_S 1.76 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q4_K_S.gguf GGUF Q4_K_S 2.22 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q5_K_M.gguf GGUF Q5_K_M 2.69 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q5_K_S.gguf GGUF Q5_K_S 2.63 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q6_K.gguf GGUF Q6_K 3.08 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled.Q4_K_M.gguf GGUF Q4_K_M 2.33 GB Download
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled.Q8_0.gguf GGUF โ€” 3.99 GB Download

Model Details Live

Model Slug
ademola265/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf
Author
Ademola265
Pipeline Task
text-generation
Library
โ€”
Created
2026-02-24
Last Modified
2026-02-24
Gated
No
Private
No
HF SHA
40c048773a235ee7fdf48918643bc4735080b5a3
License
apache-2.0
Language
en, zh
Base Model
Jackrong/DASD-4B-Thinking-2507-GRPO-v2

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "language": [
      "en",
      "zh"
    ],
    "license": "apache-2.0",
    "base_model": "Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
    "tags": [
      "qwen3",
      "unsloth",
      "text-generation",
      "reasoning",
      "math",
      "grpo",
      "sft",
      "distillation",
      "conversational",
      "glm-4.7"
    ],
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "language": [
        "en",
        "zh"
      ],
      "license": "apache-2.0",
      "base_model": "Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
      "tags": [
        "qwen3",
        "unsloth",
        "text-generation",
        "reasoning",
        "math",
        "grpo",
        "sft",
        "distillation",
        "conversational",
        "glm-4.7"
      ],
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "",
    "summary": "**Qwen3-4B-Thinking-2507-GLM-4.7-Distilled** is a fine-tuned model built upon the GRPO-optimized Jackrong/DASD-4B-Thinking-2507-GRPO-v2 (originally based on Qwen/Qwen3-4B-Thinking-2507). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and **structured Chain-of-Thought (CoT) execution**. ๐ŸŽฏ **Core Improvement:** The primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting (\"think-as-you-go\") stream of consciousness, this model has learned to adopt a highly confident, **\"Plan-then-Execute\"** paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation. ---",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlanguage:\n- en\n- zh\nlicense: apache-2.0\nbase_model: Jackrong/DASD-4B-Thinking-2507-GRPO-v2\ntags:\n- qwen3\n- unsloth\n- text-generation\n- reasoning\n- math\n- grpo\n- sft\n- distillation\n- conversational\n- glm-4.7\npipeline_tag: text-generation\n---\n\n# Qwen3-4B-Thinking-2507-GLM-4.7-Distilled\n\n**Qwen3-4B-Thinking-2507-GLM-4.7-Distilled** is a fine-tuned model built upon the GRPO-optimized `Jackrong/DASD-4B-Thinking-2507-GRPO-v2` (originally based on [`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and **structured Chain-of-Thought (CoT) execution**.\n\n๐ŸŽฏ **Core Improvement:** \nThe primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting (\"think-as-you-go\") stream of consciousness, this model has learned to adopt a highly confident, **\"Plan-then-Execute\"** paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation.\n\n---\n\n## ๐Ÿงฌ Training Pipeline Overview\n\nThis model is the culmination of two sequential training stages targeting mathematical reasoning and conversational CoT tracking:\n\n```text\nQwen/Qwen3-4B-Thinking-2507\n         โ”‚\n         โ–ผ  Stage 0: GRPO (RL on Math & Reasoning)\nDASD-4B-Thinking-2507-GRPO-v2\n         โ”‚\n         โ–ผ  Stage 1: SFT with GLM-4.7 Series Distilled Datasets (T=1.0)\nQwen3-4B-Thinking-2507-GLM-4.7-Distilled  โ† (this model)\n```\n\n### ๐Ÿง  Chain of Thought (CoT) Evolution: Base vs. Distilled\n\nA significant shift in the model's reasoning style is observed after distillation from the GLM-4.7 series data. The model transitions from a **spontaneous thinker** into a **structured planner**:\n\n| ๐ŸŽฏ Feature | ๐ŸŒ€ Base Model (Qwen3-4B-Thinking) | โœจ Distilled Model (GLM-4.7-Distilled) |\n| :--- | :--- | :--- |\n| **Thinking Style** | ๐ŸŒŠ Linear, stream-of-consciousness | ๐Ÿงฑ Modularized, report-like |\n| **Execution** | ๐Ÿƒ Thinks on the fly, writes as it thinks | ๐Ÿ“ \"Plan-then-Execute\" framework |\n| **Structure** | ๐Ÿ”€ Unstructured, organic self-correction mid-thought | ๐Ÿ“‘ Highly structured with headings & logical phases |\n| **Confidence** | ๐Ÿค” High self-doubt (\"Wait...\", \"Maybe...\", \"Should I...\") | ๐Ÿš€ Highly confident, rarely hesitates |\n| **Output Tone** | ๐Ÿ—ฃ๏ธ Conversational, exploring multiple paths | ๐Ÿ“Š Objective, direct, and systematic |\n\n**๐ŸŒŸ Key Takeaway:**\nThrough the GLM-4.7 dataset distillation, the model successfully learned the **modular thinking paradigm**. Instead of continuously questioning itself, it now *breaks down tasks, creates a clear outline, and systematically executes each step* like writing a formal report.\n\n---\n\n## ๐Ÿ“š Stage Details\n\n### Stage 0 โ€” GRPO Reinforcement Learning: `DASD-4B-Thinking-2507-GRPO-v2`\n\nStarting from the base model `Qwen/Qwen3-4B-Thinking-2507`, Group Relative Policy Optimization (GRPO) was applied. This stage consisted of:\n- **Cold Start:** Fine-tuning on the [`unsloth/OpenMathReasoning-mini`](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini) dataset.\n- **Reinforcement Learning:** Applying GRPO via the [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) dataset.\n\nThis stage significantly improved the model's:\n\n- Correctness on math problem solving\n- Step-by-step logical reasoning\n- Reward signal alignment for verifiable tasks\n\n---\n\n### Stage 1 โ€” SFT GLM-4.7 Distillation (T=1.0): `Qwen3-4B-Thinking-2507-GLM-4.7-Distilled` (this model)\n\nBuilding on the reasoning foundation of `DASD-4B-Thinking-2507-GRPO-v2`, Stage 1 SFT was performed using a mixed dataset heavily utilizing **GLM-4.7** synthetic data generated at a **default temperature of 1.0**, along with multi-turn alignments.\n\nHigher-temperature data introduces greater **lexical diversity, broader mode coverage, and more formatted/structured chain-of-thought traces**, enabling the model to generalize better across diverse conversational reasoning patterns and problem domains. It helps the model handle multi-turn conversations effectively while protecting its internal structure of `<think>...</think>` tracking.\n\n---\n\n## ๐Ÿ—‚๏ธ All Datasets Used\n\n| Stage | Dataset | Purpose |\n|-------|---------|---------|\n| GRPO (Cold Start) | [`unsloth/OpenMathReasoning-mini`](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini) | Initial foundational mathematical reasoning |\n| GRPO (RL) | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | Math & reasoning RL training via GRPO |\n| SFT Distillation | [`Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b`](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) (Stage 2) | Diverse reasoning structures |\n| SFT Distillation | [`Jackrong/glm-4.7-multiturn-CoT`](https://huggingface.co/datasets/Jackrong/glm-4.7-multiturn-CoT) | Multi-turn CoT alignment |\n| SFT Distillation | [`Jackrong/glm-4.7-Superior-Reasoning-stage1`](https://huggingface.co/datasets/Jackrong/glm-4.7-Superior-Reasoning-stage1) | Enhanced fundamental reasoning |\n| SFT Distillation | [`TeichAI/glm-4.7-2000x`](https://huggingface.co/datasets/TeichAI/glm-4.7-2000x) | Generalization and lexical diversity |\n| SFT Distillation | [`Jackrong/MultiReason-ChatAlpaca`](https://huggingface.co/datasets/Jackrong/MultiReason-ChatAlpaca) | Conversational multi-turn tracking |\n\n---\n\n## ๐Ÿƒ Quickstart\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=\"auto\", device_map=\"auto\")\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Solve: find all real solutions to x^3 - 6x^2 + 11x - 6 = 0.\"}\n]\n\ntext = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\ninputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=4096)\nresponse = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)\nprint(response)\n```\n\n> **Tip:** This model naturally generates `<think>...</think>` reasoning traces before the final answer. You can parse these to inspect the chain-of-thought.\n\n---\n\n## ๐Ÿ“‹ Model Details\n\n| Attribute | Value |\n|-----------|-------|\n| **Base Model** | `Jackrong/DASD-4B-Thinking-2507-GRPO-v2` |\n| **Architecture** | Qwen3 (4B Dense) |\n| **License** | Apache 2.0 |\n| **Language(s)** | English, Chinese |\n| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + Hugging Face TRL |\n| **RL Algorithm** | GRPO (Group Relative Policy Optimization) |\n| **Fine-tuning Method** | SFT (GLM-4.7 Distillation at T=1.0) |\n| **Developed by** | Jackrong |\n\n---\n\n## โš ๏ธ Limitations & Intended Use\n\n- This model is intended for **research and educational purposes** related to reasoning and mathematical problem-solving.\n- While mathematical and logical reasoning capabilities have been enhanced, the model may still produce incorrect answers or hallucinations โ€” always verify outputs on critical tasks.\n- The model inherits the capabilities and limitations of the underlying `Qwen3-4B-Thinking-2507` architecture.\n- Not intended for deployment in high-stakes applications without additional safety evaluation.\n\n---\n\n## ๐Ÿ“Ž Related Models\n\n| Model | Description |\n|-------|-------------|\n| [`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) | Base model |\n| [`Jackrong/DASD-4B-Thinking-2507-GRPO-v2`](https://huggingface.co/Jackrong/DASD-4B-Thinking-2507-GRPO-v2) | After GRPO RL training |\n| [`Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled`](https://huggingface.co/Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled) | **This model** โ€” GLM-4.7 Distilled |\n\n---\n\n## ๐Ÿ™ Acknowledgements\n\n- [Zhipu AI](https://huggingface.co/THUDM) for the GLM-4.7 model series capability\n- [Alibaba Cloud Apsara Lab](https://huggingface.co/Alibaba-Apsara) for reasoning datasets\n- [Open-R1](https://huggingface.co/open-r1) for the DAPO Math dataset\n- [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning infrastructure\n- [Qwen Team](https://huggingface.co/Qwen) for the excellent base model\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "qwen3",
    "unsloth",
    "text-generation",
    "reasoning",
    "math",
    "grpo",
    "sft",
    "distillation",
    "conversational",
    "glm-4.7",
    "en",
    "zh",
    "base_model:Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
    "base_model:quantized:Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 110,
  "gated": false,
  "private": false,
  "last_modified": "2026-02-24T16:03:04.000Z",
  "created_at": "2026-02-24T16:03:04.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "699dcbb842058fc25115b0d1",
  "id": "Ademola265/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-GGUF",
  "modelId": "Ademola265/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-GGUF",
  "sha": "40c048773a235ee7fdf48918643bc4735080b5a3",
  "createdAt": "2026-02-24T16:03:04.000Z",
  "lastModified": "2026-02-24T16:03:04.000Z",
  "author": "Ademola265",
  "downloads": 110,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 15
}