jackrong/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf Q3_K_L GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
jackrong/qwen3-4b-thinking-2507-glm-4.7-distilled-gguf overview
Qwen3-4B-Thinking-2507-GLM-4.7-Distilled is a fine-tuned model built upon the GRPO-optimized Jackrong/DASD-4B-Thinking-2507-GRPO-v2 (originally based on Qwen/Qwen3-4B-Thinking-2507). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and structured Chain-of-Thought (CoT) execution. ๐ฏ Core Improvement: The primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting ("think-as-you-go") stream of consciousness, this model has learned to adopt a highly confident, "Plan-then-Execute" paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation. ---
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-BF16.gguf | GGUF | BF16 | 7.50 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-IQ4_XS.gguf | GGUF | IQ4_XS | 2.13 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q2_K.gguf | GGUF | Q2_K | 1.55 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_L.gguf | GGUF | Q3_K_L | 2.09 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_M.gguf | GGUF | Q3_K_M | 1.93 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q3_K_S.gguf | GGUF | Q3_K_S | 1.76 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q4_K_S.gguf | GGUF | Q4_K_S | 2.22 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q5_K_M.gguf | GGUF | Q5_K_M | 2.69 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q5_K_S.gguf | GGUF | Q5_K_S | 2.63 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-Q6_K.gguf | GGUF | Q6_K | 3.08 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled.Q4_K_M.gguf | GGUF | Q4_K_M | 2.33 GB | Download |
| Qwen3-4B-Thinking-2507-GLM-4.7-Distilled.Q8_0.gguf | GGUF | โ | 3.99 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"language": [
"en",
"zh"
],
"license": "apache-2.0",
"base_model": "Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
"tags": [
"qwen3",
"unsloth",
"text-generation",
"reasoning",
"math",
"grpo",
"sft",
"distillation",
"conversational",
"glm-4.7"
],
"pipeline_tag": "text-generation",
"frontmatter": {
"language": [
"en",
"zh"
],
"license": "apache-2.0",
"base_model": "Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
"tags": [
"qwen3",
"unsloth",
"text-generation",
"reasoning",
"math",
"grpo",
"sft",
"distillation",
"conversational",
"glm-4.7"
],
"pipeline_tag": "text-generation"
},
"hero_image_url": "",
"summary": "**Qwen3-4B-Thinking-2507-GLM-4.7-Distilled** is a fine-tuned model built upon the GRPO-optimized Jackrong/DASD-4B-Thinking-2507-GRPO-v2 (originally based on Qwen/Qwen3-4B-Thinking-2507). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and **structured Chain-of-Thought (CoT) execution**. ๐ฏ **Core Improvement:** The primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting (\"think-as-you-go\") stream of consciousness, this model has learned to adopt a highly confident, **\"Plan-then-Execute\"** paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation. ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlanguage:\n- en\n- zh\nlicense: apache-2.0\nbase_model: Jackrong/DASD-4B-Thinking-2507-GRPO-v2\ntags:\n- qwen3\n- unsloth\n- text-generation\n- reasoning\n- math\n- grpo\n- sft\n- distillation\n- conversational\n- glm-4.7\npipeline_tag: text-generation\n---\n\n# Qwen3-4B-Thinking-2507-GLM-4.7-Distilled\n\n**Qwen3-4B-Thinking-2507-GLM-4.7-Distilled** is a fine-tuned model built upon the GRPO-optimized `Jackrong/DASD-4B-Thinking-2507-GRPO-v2` (originally based on [`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)). This model was developed using a Supervised Fine-Tuning (SFT) strategy heavily distilled from the GLM-4.7 model series (at a default temperature of 1.0), with a central focus on multi-turn conversational alignment and **structured Chain-of-Thought (CoT) execution**.\n\n๐ฏ **Core Improvement:** \nThe primary objective of this fine-tuning was to transform the model's reasoning pattern for everyday and lightweight tasks. Instead of the typical linear, free-associative, and highly self-correcting (\"think-as-you-go\") stream of consciousness, this model has learned to adopt a highly confident, **\"Plan-then-Execute\"** paradigm. It systematically breaks down tasks into logical outlines and executes modular, report-like responses without unnecessary self-doubt or hesitation.\n\n---\n\n## ๐งฌ Training Pipeline Overview\n\nThis model is the culmination of two sequential training stages targeting mathematical reasoning and conversational CoT tracking:\n\n```text\nQwen/Qwen3-4B-Thinking-2507\n โ\n โผ Stage 0: GRPO (RL on Math & Reasoning)\nDASD-4B-Thinking-2507-GRPO-v2\n โ\n โผ Stage 1: SFT with GLM-4.7 Series Distilled Datasets (T=1.0)\nQwen3-4B-Thinking-2507-GLM-4.7-Distilled โ (this model)\n```\n\n### ๐ง Chain of Thought (CoT) Evolution: Base vs. Distilled\n\nA significant shift in the model's reasoning style is observed after distillation from the GLM-4.7 series data. The model transitions from a **spontaneous thinker** into a **structured planner**:\n\n| ๐ฏ Feature | ๐ Base Model (Qwen3-4B-Thinking) | โจ Distilled Model (GLM-4.7-Distilled) |\n| :--- | :--- | :--- |\n| **Thinking Style** | ๐ Linear, stream-of-consciousness | ๐งฑ Modularized, report-like |\n| **Execution** | ๐ Thinks on the fly, writes as it thinks | ๐ \"Plan-then-Execute\" framework |\n| **Structure** | ๐ Unstructured, organic self-correction mid-thought | ๐ Highly structured with headings & logical phases |\n| **Confidence** | ๐ค High self-doubt (\"Wait...\", \"Maybe...\", \"Should I...\") | ๐ Highly confident, rarely hesitates |\n| **Output Tone** | ๐ฃ๏ธ Conversational, exploring multiple paths | ๐ Objective, direct, and systematic |\n\n**๐ Key Takeaway:**\nThrough the GLM-4.7 dataset distillation, the model successfully learned the **modular thinking paradigm**. Instead of continuously questioning itself, it now *breaks down tasks, creates a clear outline, and systematically executes each step* like writing a formal report.\n\n---\n\n## ๐ Stage Details\n\n### Stage 0 โ GRPO Reinforcement Learning: `DASD-4B-Thinking-2507-GRPO-v2`\n\nStarting from the base model `Qwen/Qwen3-4B-Thinking-2507`, Group Relative Policy Optimization (GRPO) was applied. This stage consisted of:\n- **Cold Start:** Fine-tuning on the [`unsloth/OpenMathReasoning-mini`](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini) dataset.\n- **Reinforcement Learning:** Applying GRPO via the [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) dataset.\n\nThis stage significantly improved the model's:\n\n- Correctness on math problem solving\n- Step-by-step logical reasoning\n- Reward signal alignment for verifiable tasks\n\n---\n\n### Stage 1 โ SFT GLM-4.7 Distillation (T=1.0): `Qwen3-4B-Thinking-2507-GLM-4.7-Distilled` (this model)\n\nBuilding on the reasoning foundation of `DASD-4B-Thinking-2507-GRPO-v2`, Stage 1 SFT was performed using a mixed dataset heavily utilizing **GLM-4.7** synthetic data generated at a **default temperature of 1.0**, along with multi-turn alignments.\n\nHigher-temperature data introduces greater **lexical diversity, broader mode coverage, and more formatted/structured chain-of-thought traces**, enabling the model to generalize better across diverse conversational reasoning patterns and problem domains. It helps the model handle multi-turn conversations effectively while protecting its internal structure of `<think>...</think>` tracking.\n\n---\n\n## ๐๏ธ All Datasets Used\n\n| Stage | Dataset | Purpose |\n|-------|---------|---------|\n| GRPO (Cold Start) | [`unsloth/OpenMathReasoning-mini`](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini) | Initial foundational mathematical reasoning |\n| GRPO (RL) | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | Math & reasoning RL training via GRPO |\n| SFT Distillation | [`Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b`](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) (Stage 2) | Diverse reasoning structures |\n| SFT Distillation | [`Jackrong/glm-4.7-multiturn-CoT`](https://huggingface.co/datasets/Jackrong/glm-4.7-multiturn-CoT) | Multi-turn CoT alignment |\n| SFT Distillation | [`Jackrong/glm-4.7-Superior-Reasoning-stage1`](https://huggingface.co/datasets/Jackrong/glm-4.7-Superior-Reasoning-stage1) | Enhanced fundamental reasoning |\n| SFT Distillation | [`TeichAI/glm-4.7-2000x`](https://huggingface.co/datasets/TeichAI/glm-4.7-2000x) | Generalization and lexical diversity |\n| SFT Distillation | [`Jackrong/MultiReason-ChatAlpaca`](https://huggingface.co/datasets/Jackrong/MultiReason-ChatAlpaca) | Conversational multi-turn tracking |\n\n---\n\n## ๐ Quickstart\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=\"auto\", device_map=\"auto\")\n\nmessages = [\n {\"role\": \"user\", \"content\": \"Solve: find all real solutions to x^3 - 6x^2 + 11x - 6 = 0.\"}\n]\n\ntext = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\ninputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\noutputs = model.generate(**inputs, max_new_tokens=4096)\nresponse = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)\nprint(response)\n```\n\n> **Tip:** This model naturally generates `<think>...</think>` reasoning traces before the final answer. You can parse these to inspect the chain-of-thought.\n\n---\n\n## ๐ Model Details\n\n| Attribute | Value |\n|-----------|-------|\n| **Base Model** | `Jackrong/DASD-4B-Thinking-2507-GRPO-v2` |\n| **Architecture** | Qwen3 (4B Dense) |\n| **License** | Apache 2.0 |\n| **Language(s)** | English, Chinese |\n| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + Hugging Face TRL |\n| **RL Algorithm** | GRPO (Group Relative Policy Optimization) |\n| **Fine-tuning Method** | SFT (GLM-4.7 Distillation at T=1.0) |\n| **Developed by** | Jackrong |\n\n---\n\n## โ ๏ธ Limitations & Intended Use\n\n- This model is intended for **research and educational purposes** related to reasoning and mathematical problem-solving.\n- While mathematical and logical reasoning capabilities have been enhanced, the model may still produce incorrect answers or hallucinations โ always verify outputs on critical tasks.\n- The model inherits the capabilities and limitations of the underlying `Qwen3-4B-Thinking-2507` architecture.\n- Not intended for deployment in high-stakes applications without additional safety evaluation.\n\n---\n\n## ๐ Related Models\n\n| Model | Description |\n|-------|-------------|\n| [`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) | Base model |\n| [`Jackrong/DASD-4B-Thinking-2507-GRPO-v2`](https://huggingface.co/Jackrong/DASD-4B-Thinking-2507-GRPO-v2) | After GRPO RL training |\n| [`Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled`](https://huggingface.co/Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled) | **This model** โ GLM-4.7 Distilled |\n\n---\n\n## ๐ Acknowledgements\n\n- [Zhipu AI](https://huggingface.co/THUDM) for the GLM-4.7 model series capability\n- [Alibaba Cloud Apsara Lab](https://huggingface.co/Alibaba-Apsara) for reasoning datasets\n- [Open-R1](https://huggingface.co/open-r1) for the DAPO Math dataset\n- [Unsloth](https://github.com/unslothai/unsloth) for efficient fine-tuning infrastructure\n- [Qwen Team](https://huggingface.co/Qwen) for the excellent base model\n",
"related_quantizations": []
},
"tags": [
"gguf",
"qwen3",
"unsloth",
"text-generation",
"reasoning",
"math",
"grpo",
"sft",
"distillation",
"conversational",
"glm-4.7",
"en",
"zh",
"base_model:Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
"base_model:quantized:Jackrong/DASD-4B-Thinking-2507-GRPO-v2",
"license:apache-2.0",
"endpoints_compatible",
"region:us"
],
"likes": 2,
"downloads": 658,
"gated": false,
"private": false,
"last_modified": "2026-02-24T02:58:14.000Z",
"created_at": "2026-02-23T15:05:12.000Z",
"pipeline_tag": "text-generation",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "699c6ca8dac985183e28ce38",
"id": "Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-GGUF",
"modelId": "Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-GGUF",
"sha": "cf4722c669a1369d1b48e78a6a6c73ba031b057f",
"createdAt": "2026-02-23T15:05:12.000Z",
"lastModified": "2026-02-24T02:58:14.000Z",
"author": "Jackrong",
"downloads": 658,
"likes": 2,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "",
"siblings_count": 15
}