Model Intelligence Sheet

mungert/deepseek-r1-distill-qwen-7b-gguf overview

Comprehensive model page for mungert/deepseek-r1-distill-qwen-7b-gguf

transformersggufarxiv:2501.12948license:mitendpoints_compatibleregion:usimatrixconversational

Downloads

287

Likes

Pipeline

—

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

30 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
DeepSeek-R1-Distill-Qwen-7B-bf16-q4_k.gguf	GGUF	BF16	5.69 GB	Download
DeepSeek-R1-Distill-Qwen-7B-bf16-q6_k.gguf	GGUF	BF16	7.02 GB	Download
DeepSeek-R1-Distill-Qwen-7B-bf16-q8_0.gguf	GGUF	BF16	8.49 GB	Download
DeepSeek-R1-Distill-Qwen-7B-bf16.gguf	GGUF	BF16	14.19 GB	Download
DeepSeek-R1-Distill-Qwen-7B-f16-q4_k.gguf	GGUF	F16	5.69 GB	Download
DeepSeek-R1-Distill-Qwen-7B-f16-q6_k.gguf	GGUF	F16	7.02 GB	Download
DeepSeek-R1-Distill-Qwen-7B-f16-q8_0.gguf	GGUF	F16	8.49 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq2_m.gguf	GGUF	IQ2_M	2.98 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq2_s.gguf	GGUF	IQ2_S	2.86 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq2_xs.gguf	GGUF	IQ2_XS	2.57 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq2_xxs.gguf	GGUF	IQ2_XXS	2.44 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq3_m.gguf	GGUF	IQ3_M	3.39 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq3_s.gguf	GGUF	IQ3_S	3.32 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq3_xs.gguf	GGUF	IQ3_XS	3.18 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq3_xxs.gguf	GGUF	IQ3_XXS	3.03 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq4_nl.gguf	GGUF	IQ4_NL	4.13 GB	Download
DeepSeek-R1-Distill-Qwen-7B-iq4_xs.gguf	GGUF	IQ4_XS	3.93 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q2_k_s.gguf	GGUF	Q2_K_S	2.75 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q3_k_m.gguf	GGUF	Q3_K_M	3.61 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q3_k_s.gguf	GGUF	Q3_K_S	3.32 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q4_0.gguf	GGUF	—	4.00 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q4_1.gguf	GGUF	—	4.44 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q4_k_m.gguf	GGUF	Q4_K_M	4.49 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q4_k_s.gguf	GGUF	Q4_K_S	4.28 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q5_0.gguf	GGUF	—	4.88 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q5_1.gguf	GGUF	—	5.33 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q5_k_m.gguf	GGUF	Q5_K_M	5.14 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q5_k_s.gguf	GGUF	Q5_K_S	5.02 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q6_k_m.gguf	GGUF	Q6_K_M	5.82 GB	Download
DeepSeek-R1-Distill-Qwen-7B-q8_0.gguf	GGUF	—	7.54 GB	Download

Model Details Live

Model Slug

mungert/deepseek-r1-distill-qwen-7b-gguf

Author

Mungert

Pipeline Task

—

Library

transformers

Created

2025-03-19

Last Modified

2025-09-24

Gated

Private

HF SHA

dc4514148684b77c108ee65515345a6d997d327e

License

mit

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "mit",
    "library_name": "transformers",
    "frontmatter": {
      "license": "mit",
      "library_name": "transformers"
    },
    "hero_image_url": "https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: mit\nlibrary_name: transformers\n---\n\n# <span style=\"color: #7FFF7F;\">DeepSeek-R1-Distill-Qwen-7B GGUF Models</span>\n\n## <span style=\"color: #7FFF7F;\">Ultra-Low-Bit Quantization with IQ-DynamicGate (1-2 bit)</span>\n\nOur latest quantization method introduces **precision-adaptive quantization** for ultra-low-bit models (1-2 bit), with benchmark-proven improvements on **Llama-3-8B**. This approach uses layer-specific strategies to preserve accuracy while maintaining extreme memory efficiency.\n\n### **Benchmark Context**\nAll tests conducted on **Llama-3-8B-Instruct** using:\n- Standard perplexity evaluation pipeline\n- 2048-token context window\n- Same prompt set across all quantizations\n\n### **Method**\n- **Dynamic Precision Allocation**:  \n  - First/Last 25% of layers → IQ4_XS (selected layers)  \n  - Middle 50% → IQ2_XXS/IQ3_S (increase efficiency)  \n- **Critical Component Protection**:  \n  - Embeddings/output layers use Q5_K  \n  - Reduces error propagation by 38% vs standard 1-2bit  \n\n### **Quantization Performance Comparison (Llama-3-8B)**\n\n| Quantization | Standard PPL | DynamicGate PPL | Δ PPL   | Std Size | DG Size | Δ Size | Std Speed | DG Speed |\n|--------------|--------------|------------------|---------|----------|---------|--------|-----------|----------|\n| IQ2_XXS      | 11.30        | 9.84             | -12.9%  | 2.5G     | 2.6G    | +0.1G  | 234s      | 246s     |\n| IQ2_XS       | 11.72        | 11.63            | -0.8%   | 2.7G     | 2.8G    | +0.1G  | 242s      | 246s     |\n| IQ2_S        | 14.31        | 9.02             | -36.9%  | 2.7G     | 2.9G    | +0.2G  | 238s      | 244s     |\n| IQ1_M        | 27.46        | 15.41            | -43.9%  | 2.2G     | 2.5G    | +0.3G  | 206s      | 212s     |\n| IQ1_S        | 53.07        | 32.00            | -39.7%  | 2.1G     | 2.4G    | +0.3G  | 184s      | 209s     |\n\n**Key**:\n- PPL = Perplexity (lower is better)\n- Δ PPL = Percentage change from standard to DynamicGate\n- Speed = Inference time (CPU avx2, 2048 token context)\n- Size differences reflect mixed quantization overhead\n\n**Key Improvements:**\n- 🔥 **IQ1_M** shows massive 43.9% perplexity reduction (27.46 → 15.41)\n- 🚀 **IQ2_S** cuts perplexity by 36.9% while adding only 0.2GB\n- ⚡ **IQ1_S** maintains 39.7% better accuracy despite 1-bit quantization\n\n**Tradeoffs:**\n- All variants have modest size increases (0.1-0.3GB)\n- Inference speeds remain comparable (<5% difference)\n\n\n### **When to Use These Models**\n📌 **Fitting models into GPU VRAM**\n\n✔ **Memory-constrained deployments**\n\n✔ **Cpu and Edge Devices** where 1-2bit errors can be tolerated \n \n✔ **Research** into ultra-low-bit quantization\n\n\n## **Choosing the Right Model Format**  \n\nSelecting the correct model format depends on your **hardware capabilities** and **memory constraints**.  \n\n### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**  \n- A 16-bit floating-point format designed for **faster computation** while retaining good precision.  \n- Provides **similar dynamic range** as FP32 but with **lower memory usage**.  \n- Recommended if your hardware supports **BF16 acceleration** (check your device's specs).  \n- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.  \n\n📌 **Use BF16 if:**  \n✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).  \n✔ You want **higher precision** while saving memory.  \n✔ You plan to **requantize** the model into another format.  \n\n📌 **Avoid BF16 if:**  \n❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).  \n❌ You need compatibility with older devices that lack BF16 optimization.  \n\n---\n\n### **F16 (Float 16) – More widely supported than BF16**  \n- A 16-bit floating-point **high precision** but with less of range of values than BF16. \n- Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).  \n- Slightly lower numerical precision than BF16 but generally sufficient for inference.  \n\n📌 **Use F16 if:**  \n✔ Your hardware supports **FP16** but **not BF16**.  \n✔ You need a **balance between speed, memory usage, and accuracy**.  \n✔ You are running on a **GPU** or another device optimized for FP16 computations.  \n\n📌 **Avoid F16 if:**  \n❌ Your device lacks **native FP16 support** (it may run slower than expected).  \n❌ You have memory limitations.  \n\n---\n\n### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**  \nQuantization reduces model size and memory usage while maintaining as much accuracy as possible.  \n- **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision.  \n- **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory.  \n\n📌 **Use Quantized Models if:**  \n✔ You are running inference on a **CPU** and need an optimized model.  \n✔ Your device has **low VRAM** and cannot load full-precision models.  \n✔ You want to reduce **memory footprint** while keeping reasonable accuracy.  \n\n📌 **Avoid Quantized Models if:**  \n❌ You need **maximum accuracy** (full-precision models are better for this).  \n❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).  \n\n---\n\n### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**  \nThese models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.  \n\n- **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.  \n  - **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.  \n  - **Trade-off**: Lower accuracy compared to higher-bit quantizations.  \n\n- **IQ3_S**: Small block size for **maximum memory efficiency**.  \n  - **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.  \n\n- **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.  \n  - **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.  \n\n- **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.  \n  - **Use case**: Best for **low-memory devices** where **Q6_K** is too large.  \n\n- **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.  \n  - **Use case**: Best for **ARM-based devices** or **low-memory environments**.  \n\n---\n\n### **Summary Table: Model Format Selection**  \n\n| Model Format  | Precision  | Memory Usage  | Device Requirements  | Best Use Case  |  \n|--------------|------------|---------------|----------------------|---------------|  \n| **BF16**     | Highest    | High          | BF16-supported GPU/CPUs  | High-speed inference with reduced memory |  \n| **F16**      | High       | High          | FP16-supported devices | GPU inference when BF16 isn't available |  \n| **Q4_K**     | Medium Low | Low           | CPU or Low-VRAM devices | Best for memory-constrained environments |  \n| **Q6_K**     | Medium     | Moderate      | CPU with more memory | Better accuracy while still being quantized |  \n| **Q8_0**     | High       | Moderate      | CPU or GPU with enough VRAM | Best accuracy among quantized models |  \n| **IQ3_XS**   | Very Low   | Very Low      | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |  \n| **Q4_0**     | Low        | Low           | ARM or low-memory devices | llama.cpp can optimize for ARM devices |  \n\n---\n\n## **Included Files & Details**  \n\n### `DeepSeek-R1-Distill-Qwen-7B-bf16.gguf`  \n- Model weights preserved in **BF16**.  \n- Use this if you want to **requantize** the model into a different format.  \n- Best if your device supports **BF16 acceleration**.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-f16.gguf`  \n- Model weights stored in **F16**.  \n- Use if your device supports **FP16**, especially if BF16 is not available.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-bf16-q8_0.gguf`  \n- **Output & embeddings** remain in **BF16**.  \n- All other layers quantized to **Q8_0**.  \n- Use if your device supports **BF16** and you want a quantized version.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-f16-q8_0.gguf`  \n- **Output & embeddings** remain in **F16**.  \n- All other layers quantized to **Q8_0**.    \n\n### `DeepSeek-R1-Distill-Qwen-7B-q4_k.gguf`  \n- **Output & embeddings** quantized to **Q8_0**.  \n- All other layers quantized to **Q4_K**.  \n- Good for **CPU inference** with limited memory.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-q4_k_s.gguf`  \n- Smallest **Q4_K** variant, using less memory at the cost of accuracy.  \n- Best for **very low-memory setups**.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-q6_k.gguf`  \n- **Output & embeddings** quantized to **Q8_0**.  \n- All other layers quantized to **Q6_K** .  \n\n### `DeepSeek-R1-Distill-Qwen-7B-q8_0.gguf`  \n- Fully **Q8** quantized model for better accuracy.  \n- Requires **more memory** but offers higher precision.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-iq3_xs.gguf`  \n- **IQ3_XS** quantization, optimized for **extreme memory efficiency**.  \n- Best for **ultra-low-memory devices**.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-iq3_m.gguf`  \n- **IQ3_M** quantization, offering a **medium block size** for better accuracy.  \n- Suitable for **low-memory devices**.  \n\n### `DeepSeek-R1-Distill-Qwen-7B-q4_0.gguf`  \n- Pure **Q4_0** quantization, optimized for **ARM devices**.  \n- Best for **low-memory environments**.\n- Prefer IQ4_NL for better accuracy.\n\n# <span id=\"testllm\" style=\"color: #7F7FFF;\">🚀 If you find these models useful</span>\n❤ **Please click \"Like\" if you find this useful!**  \nHelp me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:  \n👉 [Quantum Network Monitor](https://readyforquantum.com)  \n\n💬 **How to test**:  \n1. Click the **chat icon** (bottom right on any page)  \n2. Choose an **AI assistant type**:  \n   - `TurboLLM` (GPT-4-mini)  \n   - `FreeLLM` (Open-source)  \n   - `TestLLM` (Experimental CPU-only)  \n\n### **What I’m Testing**  \nI’m pushing the limits of **small open-source models for AI network monitoring**, specifically:  \n- **Function calling** against live network services  \n- **How small can a model go** while still handling:  \n  - Automated **Nmap scans**  \n  - **Quantum-readiness checks**  \n  - **Metasploit integration**  \n\n🟡 **TestLLM** – Current experimental model (llama.cpp on 6 CPU threads):  \n- ✅ **Zero-configuration setup**  \n- ⏳ 30s load time (slow inference but **no API costs**)  \n- 🔧 **Help wanted!** If you’re into **edge-device AI**, let’s collaborate!  \n\n### **Other Assistants**  \n🟢 **TurboLLM** – Uses **gpt-4-mini** for:  \n- **Real-time network diagnostics**  \n- **Automated penetration testing** (Nmap/Metasploit)  \n- 🔑 Get more tokens by [downloading our Quantum Network Monitor Agent](https://readyforquantum.com/download/?utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme)  \n\n🔵 **HugLLM** – Open-source models (≈8B params):  \n- **2x more tokens** than TurboLLM  \n- **AI-powered log analysis**  \n- 🌐 Runs on Hugging Face Inference API  \n\n### 💡 **Example AI Commands to Test**:  \n1. `\"Give me info on my websites SSL certificate\"`  \n2. `\"Check if my server is using quantum safe encyption for communication\"`  \n3. `\"Run a quick Nmap vulnerability test\"`\n4. '\"Create a cmd processor to .. (what ever you want)\" Note you need to install a Quantum Network Monitor Agent to run the .net code from. This is a very flexible and powerful feature. Use with caution!\n\n### Final word\nI fund the servers to create the models files, run the Quantum Network Monitor Service and Pay for Inference from Novita and OpenAI all from my own pocket. All of the code for creating the models and the work I have done with Quantum Network Monitor is [open source](https://github.com/Mungert69). Feel free to use what you find useful. Please support my work and consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) .\nThis will help me pay for the services and increase the token limits for everyone.\n\nThank you :)  \n\n\n# DeepSeek-R1\n<!-- markdownlint-disable first-line-h1 -->\n<!-- markdownlint-disable html -->\n<!-- markdownlint-disable no-duplicate-header -->\n\n<div align=\"center\">\n  <img src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-V3\" />\n</div>\n<hr>\n<div align=\"center\" style=\"line-height: 1;\">\n  <a href=\"https://www.deepseek.com/\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Homepage\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n  <a href=\"https://chat.deepseek.com/\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Chat\" src=\"https://img.shields.io/badge/🤖%20Chat-DeepSeek%20R1-536af5?color=536af5&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n  <a href=\"https://huggingface.co/deepseek-ai\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Hugging Face\" src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n</div>\n\n<div align=\"center\" style=\"line-height: 1;\">\n  <a href=\"https://discord.gg/Tc7c45Zzu5\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Discord\" src=\"https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n  <a href=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Wechat\" src=\"https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n  <a href=\"https://twitter.com/deepseek_ai\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Twitter Follow\" src=\"https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n</div>\n\n<div align=\"center\" style=\"line-height: 1;\">\n  <a href=\"https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE\" style=\"margin: 2px;\">\n    <img alt=\"License\" src=\"https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"/>\n  </a>\n</div>\n\n\n<p align=\"center\">\n  <a href=\"https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf\"><b>Paper Link</b>👁️</a>\n</p>\n\n\n## 1. Introduction\n\nWe introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. \nDeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.\nWith RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.\nHowever, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,\nwe introduce DeepSeek-R1, which incorporates cold-start data before RL.\nDeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. \nTo support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\n**NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.**\n\n<p align=\"center\">\n  <img width=\"80%\" src=\"figures/benchmark.jpg\">\n</p>\n\n## 2. Model Summary\n\n---\n\n**Post-Training: Large-Scale Reinforcement Learning on the Base Model**\n\n-  We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.\n\n-   We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities.\n    We believe the pipeline will benefit the industry by creating better models. \n\n---\n\n**Distillation: Smaller Models Can Be Powerful Too**\n\n-  We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. \n- Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.\n\n## 3. Model Downloads\n\n### DeepSeek-R1 Models\n\n<div align=\"center\">\n\n| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |\n| :------------: | :------------: | :------------: | :------------: | :------------: |\n| DeepSeek-R1-Zero | 671B | 37B | 128K   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero)   |\n| DeepSeek-R1   | 671B | 37B |  128K   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1)   |\n\n</div>\n\nDeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. \nFor more details regarding the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.\n\n### DeepSeek-R1-Distill Models\n\n<div align=\"center\">\n\n| **Model** | **Base Model** | **Download** |\n| :------------: | :------------: | :------------: |\n| DeepSeek-R1-Distill-Qwen-1.5B  | [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)   |\n| DeepSeek-R1-Distill-Qwen-7B  | [Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)   |\n| DeepSeek-R1-Distill-Llama-8B  | [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   |\n| DeepSeek-R1-Distill-Qwen-14B   | [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)   |\n|DeepSeek-R1-Distill-Qwen-32B  | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)   |\n| DeepSeek-R1-Distill-Llama-70B  | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)   |\n\n</div>\n\nDeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.\nWe slightly change their configs and tokenizers. Please use our setting to run these models.\n\n## 4. Evaluation Results\n\n### DeepSeek-R1-Evaluation\n For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.\n<div align=\"center\">\n\n\n| Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 |\n|----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------|\n| | Architecture | - | - | MoE | - | - | MoE |\n| | # Activated Params | - | - | 37B | - | - | 37B |\n| | # Total Params | - | - | 671B | - | - | 671B |\n| English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 |\n| | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** |\n| | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** |\n| | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** |\n| | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 |\n| | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 |\n| | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 |\n| | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** |\n| | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** |\n| | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** |\n| Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** |\n| | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 |\n| | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 |\n| | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 |\n| | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 |\n| Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** |\n| | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** |\n| | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** |\n| Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** |\n| | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** |\n| | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |\n\n</div>\n\n\n### Distilled Model Evaluation\n\n\n<div align=\"center\">\n\n| Model                                    | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating |\n|------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------|\n| GPT-4o-0513                          | 9.3              | 13.4              | 74.6            | 49.9                 | 32.9                 | 759               |\n| Claude-3.5-Sonnet-1022             | 16.0             | 26.7                 | 78.3            | 65.0                 | 38.9                 | 717               |\n| o1-mini                              | 63.6             | 80.0              | 90.0            | 60.0                 | 53.8                 | **1820**          |\n| QwQ-32B-Preview                              | 44.0             | 60.0                 | 90.6            | 54.5               | 41.9                 | 1316              |\n| DeepSeek-R1-Distill-Qwen-1.5B       | 28.9             | 52.7              | 83.9            | 33.8                 | 16.9                 | 954               |\n| DeepSeek-R1-Distill-Qwen-7B          | 55.5             | 83.3              | 92.8            | 49.1                 | 37.6                 | 1189              |\n| DeepSeek-R1-Distill-Qwen-14B         | 69.7             | 80.0              | 93.9            | 59.1                 | 53.1                 | 1481              |\n| DeepSeek-R1-Distill-Qwen-32B        | **72.6**         | 83.3              | 94.3            | 62.1                 | 57.2                 | 1691              |\n| DeepSeek-R1-Distill-Llama-8B         | 50.4             | 80.0              | 89.1            | 49.0                 | 39.6                 | 1205              |\n| DeepSeek-R1-Distill-Llama-70B        | 70.0             | **86.7**          | **94.5**        | **65.2**             | **57.5**             | 1633              |\n\n</div>\n\n\n## 5. Chat Website & API Platform\nYou can chat with DeepSeek-R1 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com), and switch on the button \"DeepThink\"\n\nWe also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/)\n\n## 6. How to Run Locally\n\n### DeepSeek-R1 Models\n\nPlease visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.\n\n**NOTE: Hugging Face's Transformers has not been directly supported yet.**\n\n### DeepSeek-R1-Distill Models\n\nDeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.\n\nFor instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):\n\n```shell\nvllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager\n```\n\nYou can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)\n\n```bash\npython3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2\n```\n\n### Usage Recommendations\n\n**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**\n\n1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.\n2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**\n3. For mathematical problems, it is advisable to include a directive in your prompt such as: \"Please reason step by step, and put your final answer within \\boxed{}.\"\n4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.\n\nAdditionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting \"\\<think\\>\\n\\n\\</think\\>\") when responding to certain queries, which can adversely affect the model's performance.\n**To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with \"\\<think\\>\\n\" at the beginning of every output.**\n\n## 7. License\nThis code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).\nDeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:\n- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1.\n- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE).\n- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE).\n\n## 8. Citation\n```\n@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,\n      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, \n      author={DeepSeek-AI},\n      year={2025},\n      eprint={2501.12948},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2501.12948}, \n}\n\n```\n\n## 9. Contact\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).\n",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "arxiv:2501.12948",
    "license:mit",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 4,
  "downloads": 287,
  "gated": false,
  "private": false,
  "last_modified": "2025-09-24T15:41:24.000Z",
  "created_at": "2025-03-19T23:20:08.000Z",
  "pipeline_tag": "",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "67db5128cc71b0355e4340fb",
  "id": "Mungert/DeepSeek-R1-Distill-Qwen-7B-GGUF",
  "modelId": "Mungert/DeepSeek-R1-Distill-Qwen-7B-GGUF",
  "sha": "dc4514148684b77c108ee65515345a6d997d327e",
  "createdAt": "2025-03-19T23:20:08.000Z",
  "lastModified": "2025-09-24T15:41:24.000Z",
  "author": "Mungert",
  "downloads": 287,
  "likes": 4,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "transformers",
  "siblings_count": 33
}