Model Intelligence Sheet

unsloth/qwen3-coder-30b-a3b-instruct-gguf overview

Comprehensive model page for unsloth/qwen3-coder-30b-a3b-instruct-gguf

transformersggufunslothqwen3qwentext-generationarxiv:2505.09388base_model:Qwen/Qwen3-Coder-30B-A3B-Instructbase_model:quantized:Qwen/Qwen3-Coder-30B-A3B-Instructlicense:apache-2.0endpoints_compatibleregion:usimatrixconversational

unsloth/qwen3-coder-30b-a3b-instruct-gguf visual

Downloads

147,452

Likes

591

Pipeline

text-generation

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

28 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf	GGUF	BF16	46.24 GB	Download
Qwen3-Coder-30B-A3B-Instruct-BF16-00002-of-00002.gguf	GGUF	BF16	10.65 GB	Download
Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf	GGUF	IQ4_NL	16.12 GB	Download
Qwen3-Coder-30B-A3B-Instruct-IQ4_XS.gguf	GGUF	IQ4_XS	15.25 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q2_K.gguf	GGUF	Q2_K	10.49 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q2_K_L.gguf	GGUF	Q2_K_L	10.55 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q3_K_M.gguf	GGUF	Q3_K_M	13.70 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q3_K_S.gguf	GGUF	Q3_K_S	12.38 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q4_0.gguf	GGUF	—	16.19 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q4_1.gguf	GGUF	—	17.87 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf	GGUF	Q4_K_M	17.28 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q4_K_S.gguf	GGUF	Q4_K_S	16.26 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf	GGUF	Q5_K_M	20.23 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q5_K_S.gguf	GGUF	Q5_K_S	19.63 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q6_K.gguf	GGUF	Q6_K	23.37 GB	Download
Qwen3-Coder-30B-A3B-Instruct-Q8_0.gguf	GGUF	—	30.25 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-IQ1_M.gguf	GGUF	IQ1_M	8.97 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-IQ1_S.gguf	GGUF	IQ1_S	8.30 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-IQ2_M.gguf	GGUF	IQ2_M	10.09 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-IQ2_XXS.gguf	GGUF	IQ2_XXS	9.62 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-IQ3_XXS.gguf	GGUF	IQ3_XXS	11.97 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q2_K_XL.gguf	GGUF	Q2_K_XL	10.98 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q3_K_XL.gguf	GGUF	Q3_K_XL	12.86 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf	GGUF	Q4_K_XL	16.45 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf	GGUF	Q5_K_XL	20.25 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q6_K_XL.gguf	GGUF	Q6_K_XL	24.53 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf	GGUF	Q8_K_XL	33.52 GB	Download
Qwen3-Coder-30B-A3B-Instruct-UD-TQ1_0.gguf	GGUF	—	7.46 GB	Download

Model Details Live

Model Slug

unsloth/qwen3-coder-30b-a3b-instruct-gguf

Author

unsloth

Pipeline Task

text-generation

Library

transformers

Created

2025-07-31

Last Modified

2026-01-30

Gated

Private

HF SHA

b17cb02dd882d5b6ab62fc777ad2995f19668350

License

apache-2.0

Language

Unknown

Base Model

Qwen/Qwen3-Coder-30B-A3B-Instruct

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "tags": [
      "unsloth",
      "qwen3",
      "qwen"
    ],
    "base_model": [
      "Qwen/Qwen3-Coder-30B-A3B-Instruct"
    ],
    "library_name": "transformers",
    "license": "apache-2.0",
    "license_link": "https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE",
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "tags": [
        "unsloth",
        "qwen3",
        "qwen"
      ],
      "base_model": [
        "Qwen/Qwen3-Coder-30B-A3B-Instruct"
      ],
      "library_name": "transformers",
      "license": "apache-2.0",
      "license_link": "https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE",
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\ntags:\n- unsloth\n- qwen3\n- qwen\nbase_model:\n- Qwen/Qwen3-Coder-30B-A3B-Instruct\nlibrary_name: transformers\nlicense: apache-2.0\nlicense_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE\npipeline_tag: text-generation\n---\n<div>\n  <p style=\"margin-bottom: 0; margin-top: 0;\">\n    <strong>See <a href=\"https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95\">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>\n  </p>\n  <p style=\"margin-bottom: 0;\">\n    <em>Learn to run Qwen3-Coder correctly - <a href=\"https://docs.unsloth.ai/basics/qwen3-coder\">Read our Guide</a>.</em>\n  </p>\n<p style=\"margin-top: 0;margin-bottom: 0;\">\n   <em>See <a href=\"https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf\">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>\n  </p>\n  <div style=\"display: flex; gap: 5px; align-items: center; \">\n    <a href=\"https://github.com/unslothai/unsloth/\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"133\">\n    </a>\n    <a href=\"https://discord.gg/unsloth\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png\" width=\"173\">\n    </a>\n    <a href=\"https://docs.unsloth.ai/basics/qwen3-coder\">\n      <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png\" width=\"143\">\n    </a>\n  </div>\n<h1 style=\"margin-top: 0rem;\">✨ Read our Qwen3-Coder Guide <a href=\"https://docs.unsloth.ai/basics/qwen3-coder\">here</a>!</h1>\n</div>\n\n- Fine-tune Qwen3 (14B) for free using our Google [Colab notebook](https://docs.unsloth.ai/get-started/unsloth-notebooks)!\n- Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)\n- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n| Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |\n|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|\n| **Qwen3 (14B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 70% less |\n| **GRPO with Qwen3 (8B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 80% less |\n| **Llama-3.2 (3B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2.4x faster | 58% less |\n| **Llama-3.2 (11B vision)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)               | 2x faster | 60% less |\n| **Qwen2.5 (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)               | 2x faster | 60% less |\n\n# Qwen3-Coder-30B-A3B-Instruct\n<a href=\"https://chat.qwen.ai/\" target=\"_blank\" style=\"margin: 2px;\">\n    <img alt=\"Chat\" src=\"https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5\" style=\"display: inline-block; vertical-align: middle;\"/>\n</a>\n\n## Highlights\n\n**Qwen3-Coder** is available in multiple sizes. Today, we're excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:  \n\n- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks.\n- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.\n- **Agentic Coding** supporting for most platform such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.\n\n![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)\n\n## Model Overview\n\n**Qwen3-Coder-30B-A3B-Instruct** has the following features:\n- Type: Causal Language Models\n- Training Stage: Pretraining & Post-training\n- Number of Parameters: 30.5B in total and 3.3B activated\n- Number of Layers: 48\n- Number of Attention Heads (GQA): 32 for Q and 4 for KV\n- Number of Experts: 128\n- Number of Activated Experts: 8\n- Context Length: **262,144 natively**. \n\n**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**\n\nFor more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).\n\n\n## Quickstart\n\nWe advise you to use the latest version of `transformers`.\n\nWith `transformers<4.51.0`, you will encounter the following error:\n```\nKeyError: 'qwen3_moe'\n```\n\nThe following contains a code snippet illustrating how to use the model generate content based on given inputs. \n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"Qwen/Qwen3-Coder-30B-A3B-Instruct\"\n\n# load the tokenizer and the model\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=\"auto\",\n    device_map=\"auto\"\n)\n\n# prepare the model input\nprompt = \"Write a quick sort algorithm.\"\nmessages = [\n    {\"role\": \"user\", \"content\": prompt}\n]\ntext = tokenizer.apply_chat_template(\n    messages,\n    tokenize=False,\n    add_generation_prompt=True,\n)\nmodel_inputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n\n# conduct text completion\ngenerated_ids = model.generate(\n    **model_inputs,\n    max_new_tokens=65536\n)\noutput_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() \n\ncontent = tokenizer.decode(output_ids, skip_special_tokens=True)\n\nprint(\"content:\", content)\n```\n\n**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**\n\nFor local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.\n\n## Agentic Coding\n\nQwen3-Coder excels in tool calling capabilities. \n\nYou can simply define or use any tools as following example.\n```python\n# Your tool implementation\ndef square_the_number(num: float) -> dict:\n    return num ** 2\n\n# Define Tools\ntools=[\n    {\n        \"type\":\"function\",\n        \"function\":{\n            \"name\": \"square_the_number\",\n            \"description\": \"output the square of the number.\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"required\": [\"input_num\"],\n                \"properties\": {\n                    'input_num': {\n                        'type': 'number', \n                        'description': 'input_num is a number that will be squared'\n                        }\n                },\n            }\n        }\n    }\n]\n\nimport OpenAI\n# Define LLM\nclient = OpenAI(\n    # Use a custom endpoint compatible with OpenAI API\n    base_url='http://localhost:8000/v1',  # api_base\n    api_key=\"EMPTY\"\n)\n \nmessages = [{'role': 'user', 'content': 'square the number 1024'}]\n\ncompletion = client.chat.completions.create(\n    messages=messages,\n    model=\"Qwen3-Coder-30B-A3B-Instruct\",\n    max_tokens=65536,\n    tools=tools,\n)\n\nprint(completion.choice[0])\n```\n\n## Best Practices\n\nTo achieve optimal performance, we recommend the following settings:\n\n1. **Sampling Parameters**:\n   - We suggest using `temperature=0.7`, `top_p=0.8`, `top_k=20`, `repetition_penalty=1.05`.\n\n2. **Adequate Output Length**: We recommend using an output length of 65,536 tokens for most queries, which is adequate for instruct models.\n\n\n### Citation\n\nIf you find our work helpful, feel free to give us a cite.\n\n```\n@misc{qwen3technicalreport,\n      title={Qwen3 Technical Report}, \n      author={Qwen Team},\n      year={2025},\n      eprint={2505.09388},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2505.09388}, \n}\n```\n",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "unsloth",
    "qwen3",
    "qwen",
    "text-generation",
    "arxiv:2505.09388",
    "base_model:Qwen/Qwen3-Coder-30B-A3B-Instruct",
    "base_model:quantized:Qwen/Qwen3-Coder-30B-A3B-Instruct",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 591,
  "downloads": 147452,
  "gated": false,
  "private": false,
  "last_modified": "2026-01-30T06:29:38.000Z",
  "created_at": "2025-07-31T10:27:38.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "688b451a53e70a07b0669a7c",
  "id": "unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF",
  "modelId": "unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF",
  "sha": "b17cb02dd882d5b6ab62fc777ad2995f19668350",
  "createdAt": "2025-07-31T10:27:38.000Z",
  "lastModified": "2026-01-30T06:29:38.000Z",
  "author": "unsloth",
  "downloads": 147452,
  "likes": 591,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 33
}