What license applies to rafw007/qwen36-a3b-claude-coder-llama.cpp-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

Model Intelligence Sheet

rafw007/qwen36-a3b-claude-coder-llama.cpp-GGUF overview

Q: How do I run rafw007/qwen36-a3b-claude-coder-llama.cpp-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Qwen3.6 Claude Coder — local MoE coding agent llama.cpp build A custom configuration of Qwen3.6 35B A3B Mixture of Experts, ~3B active parameters , set up to a…

ggufqwen3moecoding-agenttool-callingllama.cppik_llama.cppclaude-codeopencodetext-generationenplbase_model:Qwen/Qwen3.6-35B-A3Bbase_model:quantized:Qwen/Qwen3.6-35B-A3Blicense:apache-2.0endpoints_compatibleregion:usconversational

Runs locally from ~22.29 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

text-generation

Author

rafw007

Repository Files & Downloads

1 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
qwen36-a3b-claude-coder-q4_K_M-llama.cpp.gguf	GGUF	Q4_K_M	22.29 GB	Download

Model Details

Model ID	rafw007/qwen36-a3b-claude-coder-llama.cpp-GGUF
Author	rafw007
Pipeline	text-generation
License	apache-2.0
Base model	Qwen/Qwen3.6-35B-A3B
Last modified	2026-06-07T22:58:21.000Z

Model README

---

license: apache-2.0

language:

base_model: Qwen/Qwen3.6-35B-A3B

pipeline_tag: text-generation

tags:

qwen3
moe
coding-agent
tool-calling
gguf
llama.cpp
ik_llama.cpp
claude-code
opencode

---

Qwen3.6 Claude Coder — local MoE coding agent (llama.cpp build)

A custom configuration of Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), set up

to act as an autonomous coding agent: it uses tools instead of guessing, grounds every answer in

the actual tool output (never fabricates results), does not loop on the same tool, and returns

complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.

Safety guardrails of the base model are intact.

It drives Claude Code, Codex and opencode fully locally — your code never leaves your

machine and cloud token cost drops to zero.

> This is the llama.cpp / ik_llama.cpp build. Same behavior and configuration as

> rafw007/qwen36-a3b-claude-coder on Ollama —

> packaged so it loads on stock llama.cpp. See "Why a separate version" below.

Why a separate version (vs. the Ollama one)

The Ollama model and this one share the same agent config (system prompt + sampling params).

What differs is packaging and the loader they target:

| | Ollama version | This llama.cpp version |

|---|---|---|

| Runtime | Ollama engine + Modelfile (RENDERER/PARSER qwen3.5) | stock llama.cpp / ik_llama.cpp (llama-server) |

| Weights | nvfp4 (~21 GB) | GGUF Q4_K_M (~24 GB) |

| Tool format | Ollama's native Qwen parser | GGUF Jinja chat template + --jinja |

| Agent config | baked into the Modelfile | supplied via launch flags + a system-prompt file (below) |

The actual fix. Qwen3.5/3.6-MoE uses multimodal RoPE (mRoPE) whose native

rope.dimension_sections is 3 ints [t, h, w]. Ollama's loader is lenient and accepts that.

Recent stock llama.cpp (the Qwen3.5 loader from PR #19435) validates that key as a length-4

array and rejects the 3-element one:

key qwen35moe.rope.dimension_sections has wrong array length; expected 4, got 3

This is a known, family-wide converter/loader mismatch — not specific to this quant. **This GGUF has

the section array padded to length 4** ([11, 11, 10] → [11, 11, 10, 0]; the 4th slot is the unused

text section, it does not change inference), so it loads cleanly on current llama.cpp and

ik_llama.cpp. If you hit the error above with any other Qwen3.5/3.6-MoE GGUF, this is the cause.

What it is (and what it is not)

Honest framing: the weights are stock Qwen3.6-35B-A3B. The "Claude Coder" behavior comes entirely

from an agentic system prompt + sampling configuration, plus the llama.cpp-compatibility rope fix

described above. Everything here is measured, not marketing.

Quick start (llama.cpp / ik_llama.cpp)

llama-server \
  -m qwen36-a3b-claude-coder-q4_K_M-llama.cpp.gguf \
  --jinja --reasoning-budget 0 \
  -c 65536 \
  --temp 0.6 --top-k 20 --top-p 0.8 --repeat-penalty 1 --presence-penalty 0 \
  --system-prompt-file qwen36-system.txt \
  --host 0.0.0.0 --port 8080

--reasoning-budget 0 enforces no-think. --jinja enables native tool-calling via the embedded

Qwen chat template. qwen36-system.txt is your agent system-prompt file (same configuration as the

Ollama build — its contents are not published).

Tested

End-to-end under opencode against ik_llama.cpp (llama-server, port-bound, --jinja): the

model emitted real tool_calls, executed a real df -h, grounded its answer on the actual output

and exited cleanly (no tool loop). Loads without the rope error on ik_llama.cpp (mRoPE sections

reported as [11, 11, 10, 0]).

Context

Configured for 64K (Claude Code's recommended minimum). Base Qwen3.6 natively supports 262K,

so context can be raised on stronger hardware. On a CPU-only box lower it (e.g. 16–32K) to fit RAM.

Files

|---|---|---|---|

| qwen36-a3b-claude-coder-q4_K_M-llama.cpp.gguf | Q4_K_M | ~24 GB | mRoPE dimension_sections padded to length-4 for stock llama.cpp / ik_llama.cpp. |

How it was made

Designed, built and tested with the help of Claude Opus — the system prompt, parameter choices

and context configuration come from that work. The llama.cpp packaging (rope-section fix + launch

recipe) was added after a user report that the Ollama-targeted GGUF would not load on stock

llama.cpp.

License

Apache 2.0 (inherited from the base Qwen3.6).

Run rafw007/qwen36-a3b-claude-coder-llama.cpp-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models