What is Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF?

--- library_name: transformers base_model: - Jackrong/Qwopus3.6-27B-v2 tags: - gguf - llama.cpp - image-text-to-text - vision - multimodal - text-generation-inference - transformers - unsloth - conversational - qwen3_6 - reasoning - chain-of-thought - lora - sft - agent - tool-use - function-calling - coder license: apache-2.0 language: - en - zh - es - ru - ja pipeline_tag: image-text-to-text datasets: - Jackrong/Claude-opus-4.6-TraceInversion-9000x - Jackrong/Claude-opus-4.7-TraceInversion-5000x - lambda/hermes-agent-reasoning-traces --- <div style="background: linear-gradient(135deg, #7c3aed 0%, #4c1d…

What license applies to Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

How do I run Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: image-text-to-text.

Model Intelligence Sheet

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF overview

Q: What license applies to Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

Q: How do I run Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: image-text-to-text.

Q: How much VRAM or disk space does Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF need?

Runs locally from ~888.0 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

<div style="font family: apple system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans serif; border: 1px solid cbd5e1; border radius: 16px; box shadow: 0 10px 15…

transformersggufllama.cppimage-text-to-textvisionmultimodaltext-generation-inferenceunslothconversationalqwen3_6reasoningchain-of-thoughtlorasftagenttool-usefunction-callingcoderenzhesrujadataset:Jackrong/Claude-opus-4.6-TraceInversion-9000x

Runs locally from ~888.0 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

image-text-to-text

Author

Jackrong

Repository Files & Downloads

9 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
Qwopus3.6-27B-Coder-MTP-Q3_K_L.gguf	GGUF	Q3_K_L	13.56 GB	Download
Qwopus3.6-27B-Coder-MTP-Q3_K_M.gguf	GGUF	Q3_K_M	12.57 GB	Download
Qwopus3.6-27B-Coder-MTP-Q4_K_M.gguf	GGUF	Q4_K_M	15.66 GB	Download
Qwopus3.6-27B-Coder-MTP-Q4_K_S.gguf	GGUF	Q4_K_S	14.74 GB	Download
Qwopus3.6-27B-Coder-MTP-Q5_K_M.gguf	GGUF	Q5_K_M	18.19 GB	Download
Qwopus3.6-27B-Coder-MTP-Q5_K_S.gguf	GGUF	Q5_K_S	17.67 GB	Download
Qwopus3.6-27B-Coder-MTP-Q6_K.gguf	GGUF	Q6_K	20.89 GB	Download
Qwopus3.6-27B-Coder-MTP-Q8_0.gguf	GGUF	Q8_0	27.05 GB	Download
mmproj-F32.gguf	GGUF	F32	888.0 MB	Download

Model Details

Model ID	Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF
Author	Jackrong
Pipeline	image-text-to-text
License	apache-2.0
Base model	Jackrong/Qwopus3.6-27B-v2
Last modified	2026-06-12T09:43:08.000Z

Model README

---

library_name: transformers

base_model:

Jackrong/Qwopus3.6-27B-v2

tags:

gguf
llama.cpp
image-text-to-text
vision
multimodal
text-generation-inference
transformers
unsloth
conversational
qwen3_6
reasoning
chain-of-thought
lora
sft
agent
tool-use
function-calling
coder

license: apache-2.0

language:

pipeline_tag: image-text-to-text

datasets:

Jackrong/Claude-opus-4.6-TraceInversion-9000x
Jackrong/Claude-opus-4.7-TraceInversion-5000x
lambda/hermes-agent-reasoning-traces

---

<h1 style="margin: 0; font-size: 26px; font-weight: 800; display: flex; align-items: center; gap: 12px; color: white; border: none;">🪐 Qwopus-3.6-27B-Coder</h1>

<span style="background: #10b981; color: white; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; text-transform: uppercase; letter-spacing: 0.5px;">Coder SFT Release</span>

</div>

<p style="margin: 8px 0 0 0; font-size: 14px; color: #ddd6fe; font-weight: 500;">Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2</p>

</div>

<span style="background: #f3e8ff; color: #6b21a8; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; border: 1px solid #e9d5ff;">🧬 Trace Inversion & Negentropy</span>

<span style="background: #dbeafe; color: #1e40af; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; border: 1px solid #bfdbfe;">🧠 27B Dense Model</span>

<span style="background: #e0f2fe; color: #0369a1; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; border: 1px solid #bae6fd;">⚡ Agentic Coding</span>

<span style="background: #d1fae5; color: #065f46; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; border: 1px solid #a7f3d0;">🛠️ Tool Calling & Agent</span>

<span style="background: #dcfce7; color: #166534; font-size: 11px; font-weight: 700; padding: 4px 10px; border-radius: 20px; border: 1px solid #bbf7d0;">🏆 SWE-bench Verified: 67.0% (off-thinking)</span>

</div>

<h3 style="margin: 0 0 8px 0; font-size: 15px; color: #6d28d9; font-weight: 700; display: flex; align-items: center; gap: 6px;"><span>💡</span> What is Qwopus-3.6-27B-Coder?</h3>

<p style="margin: 0; font-size: 13px; color: #334155; line-height: 1.6;">🪐 <b>Qwopus-3.6-27B-Coder</b> is a reasoning-enhanced agentic coding model built on top of <b>Qwopus3.6-27B-v2</b>. It inherits the powerful reasoning foundation of the v2 base — which achieved <b>87.43% MMLU-Pro (300ex)</b> and <b>75.25% SWE-bench Verified</b> — and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments.</p>

</div>

<span style="font-weight: 700; color: #6b21a8; font-size: 12px; display: block; margin-bottom: 6px; text-transform: uppercase; letter-spacing: 0.5px;">🧩 Agentic Coding</span>

<span style="font-size: 13px; color: #4b5563; line-height: 1.5;">Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows.</span>

</div>

<span style="font-weight: 700; color: #6b21a8; font-size: 12px; display: block; margin-bottom: 6px; text-transform: uppercase; letter-spacing: 0.5px;">🛠️ Tool Calling</span>

<span style="font-size: 13px; color: #4b5563; line-height: 1.5;">Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution.</span>

</div>

<span style="font-weight: 700; color: #6b21a8; font-size: 12px; display: block; margin-bottom: 6px; text-transform: uppercase; letter-spacing: 0.5px;">🧬 Trace Inversion</span>

<span style="font-size: 13px; color: #4b5563; line-height: 1.5;">Inherits the full Qwopus training recipe with reconstructed step-by-step reasoning trajectories from Claude Opus.</span>

</div>

<span style="font-weight: 700; color: #6b21a8; font-size: 12px; display: block; margin-bottom: 6px; text-transform: uppercase; letter-spacing: 0.5px;">🚀 27B Scale</span>

<span style="font-size: 13px; color: #4b5563; line-height: 1.5;">Dense 27B parameters with native long-context support, delivering deep reasoning with practical single-GPU deployability.</span>

</div>

> [!WARNING]

> Community Release Notice: Qwopus-3.6-27B-Coder is an experimental community release intended for research, evaluation, and agent workflow exploration. It has not undergone full safety evaluation or broad general-domain benchmarking.

> [!IMPORTANT]

> Benchmark Status: The first completed benchmark is SWE-bench Verified full 500 in thinking-off / no-thinking mode, where the Q5_K_M 27B GGUF run resolved 335/500 = 67.0%. Other benchmark suites remain pending and will be updated as testing completes.

---

💡 1. Base Model, Training Stack & Collaboration

<span>🧠</span> 1.1 Base Model: Qwopus3.6-27B-v2

</div>

<b>Qwopus3.6-27B-v2</b> is a reasoning-enhanced dense language model built on <b>Qwen3.6-27B</b>. Through a multi-stage curriculum learning pipeline and Trace Inversion augmentation, it achieves strong performance across knowledge, coding, and reasoning benchmarks. This coder variant inherits that foundation and extends it with specialized coding and tool-use data.

</p>

<thead>

<th style="padding: 8px 10px; border-bottom: 2px solid #7c3aed; text-align: left; color: #7c3aed; font-weight: bold; width: 30%;">Attribute</th>

<th style="padding: 8px 10px; border-bottom: 2px solid #7c3aed; text-align: left;">Specifications & Details</th>

</tr>

</thead>

<tbody>

<tr>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold;">🧠 Architecture</td>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Dense Transformer / 27 Billion Parameters</td>

</tr>

<tr>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold;">🏢 Base Developer</td>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Alibaba Cloud (DAMO Academy) — Qwen3.6-27B</td>

</tr>

<tr>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold;">🎯 Primary Focus</td>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Agentic coding, tool-use stability, code debugging, structured instruction following, repository-level tasks</td>

</tr>

<tr>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold;">🧬 Distillation Strategy</td>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Trace Inversion + high-quality agent trajectories + curriculum SFT</td>

</tr>

<tr>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold;">📄 Context Window</td>

<td style="padding: 8px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Native support up to 32K tokens (fine-tuning target); compatible with longer contexts via RoPE/YaRN scaling</td>

</tr>

</tbody>

</table>

</div>

<span>🧪</span> 1.2 Hardware Cooperation & Joint Collaboration

</div>

This project is built in close collaboration and joint effort with engineer <b>Kyle Hessling</b>, whose hardware infrastructure and training support made stable 27B-scale fine-tuning and evaluation possible.

<span>You can follow him for hardware and model training updates on X / Twitter: <a href="https://x.com/KyleHessling1" target="_blank" style="color: #047857; text-decoration: none; font-weight: 700;">@KyleHessling1</a></span>

</div>

<span>🦥</span> 1.3 Fine-Tuning Framework (Unsloth)

</div>

The model training workflow is accelerated and memory-optimized with <b>Unsloth</b>. Special thanks to the Unsloth team for making efficient large-model fine-tuning accessible.

<span>Documentation and fine-tuning guidance: <a href="https://unsloth.ai/docs" target="_blank" style="color: #7c3aed; text-decoration: none; font-weight: 700;">unsloth.ai/docs</a></span>

</div>

<span>⚡</span> 1.4 MTP Variant: Faster Speculative Decoding

</div>

A <b>Multi-Token Prediction (MTP)</b> variant of this model is also available, featuring auxiliary prediction heads (<code>draft=2</code>) for speculative decoding. Based on the Qwopus3.6-27B-v2-MTP benchmark, the MTP variant achieved <b>~1.66x speedup</b> over standard decoding with preserved accuracy. See the <a href="https://huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP" target="_blank" style="color: #0284c7; text-decoration: none; font-weight: 700;">Qwopus3.6-27B-v2-MTP</a> model card for detailed MTP performance analysis.

<span>🌟</span><span>The custom MTP heads processing pipeline is open-sourced in <a href="https://github.com/R6410418/Jackrong-llm-finetuning-guide/tree/main/qwen-mtp-gguf" target="_blank" style="color: #0284c7; text-decoration: none; font-weight: 700;">qwen-mtp-gguf</a>. If you find this toolkit helpful, please consider leaving a star on GitHub!</span>

</div>

---

📖 2. Background & Motivation

<span>🎯</span> 2.1 Why a 27B Coder Model?

</div>

The Qwopus coder line has demonstrated strong results at the 4B and 9B scales. The 27B coder variant represents a significant leap in reasoning depth, code generation quality, and tool-use robustness. At 27B parameters, the model has sufficient capacity to internalize complex repository structures, multi-file dependencies, and nuanced tool-calling patterns — while remaining deployable on a single GPU (e.g., RTX 5090). This scale bridges the gap between compact local models and expensive API-based solutions, making it suitable for production agentic coding workflows.

</div>

<span>🧬</span> 2.2 Trace Inversion & Agent Behavior

</div>

Commercial and frontier models often expose only compressed reasoning summaries. Qwopus-style training uses <b>Trace Inversion</b> to reconstruct these compressed "Reasoning Bubbles" into fuller learnable reasoning traces. For coding, this is paired with agent trajectories that include tool definitions, tool calls, and real feedback, teaching the model to reason through interactive work rather than only produce static answers.

<br><br>This model integrates:

<ul>

<li><b>claude-opus-4.6-traceInversion-9000x</b>: 9,000 high-value, fully reconstructed step-by-step reasoning trajectories.</li>

<li><b>claude-opus-4.7-traceInversion-5000x</b>: 5,000 complex multi-turn logic and mathematics samples optimized for negative entropy reconstruction.</li>

<li><b>lambda/hermes-agent-reasoning-traces</b>: ~10,000 high-quality multi-turn tool-calling trajectories from GLM-5.1 and kimi-4.6 models.</li>

</ul>

</div>

<span>📦</span> 2.3 Special Dataset: Trace Inversion & Agent Traces

</div>

<b>Trace Inversion:</b> Uses a specialized logical reconstructor, <a href="https://huggingface.co/Jackrong/Trace-Inverter-4B" target="_blank" style="color: #0369a1; text-decoration: none; font-weight: bold;">Trace-Inverter-4B</a>, to reverse-engineer compressed reasoning bubbles into complete, step-by-step learnable CoT chains. This approach addresses the <b>"Information Entropy Trap"</b> — where direct imitation of compressed summaries leads to reasoning fractures — by ensuring the model learns continuous, rigorous logical derivations.

<b>Agent Traces (lambda/hermes-agent-reasoning-traces):</b> Each sample contains real multi-turn tool execution results (not fabricated outputs), with step-by-step reasoning inside <code><think></code> tags. Coverage includes:

<ul>

<li><b>Terminal & Coding:</b> Script writing, debugging, environment configuration</li>

<li><b>Repository Tasks:</b> Bug fixing, refactoring, code review</li>

<li><b>Browser Automation:</b> Web navigation, scraping, form filling</li>

<li><b>Agent Tools:</b> Memory persistence, task delegation, skill management</li>

</ul>

</div>

---

📊 3. Performance Benchmarks

<h3 style="margin: 0; font-size: 20px; font-weight: 700; display: flex; align-items: center; gap: 8px; color: white; border: none;">📊 Evaluation & Performance Metrics</h3>

<p style="margin: 4px 0 0 0; font-size: 13px; color: #ddd6fe;">First completed result: SWE-bench Verified full 500, evaluated in no-thinking mode for fast local agentic coding.</p>

</div>

<div>

<span style="font-weight: 800; font-size: 14px; color: #5b21b6; display: block; margin-bottom: 4px;">No-Thinking SWE-bench Result</span>

<span style="font-size: 13px; color: #334155; line-height: 1.6;">This benchmark was intentionally run with <b>thinking disabled</b>. The goal is to show the model's practical coding ability when used as a fast local agent, without relying on long visible reasoning traces. On an RTX 5090 with MTP enabled, the model runs at approximately <b>100 tokens/sec</b>, making this result especially relevant for interactive development workflows.</span>

</div>

<span style="font-size: 11px; font-weight: 800; color: #6d28d9; text-transform: uppercase; display: block; margin-bottom: 6px; letter-spacing: 0.5px;">SWE-bench Verified</span>

<span style="font-size: 12px; color: #64748b; font-weight: 600;">335 / 500 resolved</span>

</div>

<span style="font-size: 11px; font-weight: 800; color: #0369a1; text-transform: uppercase; display: block; margin-bottom: 6px; letter-spacing: 0.5px;">Inference Mode</span>

<span style="font-size: 24px; font-weight: 900; color: #0369a1; display: block; line-height: 1;">Thinking Off</span>

<span style="font-size: 12px; color: #64748b; font-weight: 600;">no visible CoT required</span>

</div>

<span style="font-size: 11px; font-weight: 800; color: #047857; text-transform: uppercase; display: block; margin-bottom: 6px; letter-spacing: 0.5px;">Local Throughput</span>

</div>

<span style="font-size: 11px; font-weight: 800; color: #c2410c; text-transform: uppercase; display: block; margin-bottom: 6px; letter-spacing: 0.5px;">Evaluation Build</span>

<span style="font-size: 12px; color: #64748b; font-weight: 600;">27B GGUF quant</span>

</div>

<b>Evaluation setup:</b> SWE-bench Verified <b>full 500</b>, Qwopus-3.6-27B-Coder <b>Q5_K_M</b> GGUF, <b>thinking-off / no-thinking mode</b>. Final score: <b>335/500 = 67.0%</b>.

</div>

<span>💻</span> 3.1 SWE-bench Verified: Full 500 No-Thinking Result

</div>

<p style="margin: 0 0 12px 0;">SWE-bench Verified measures whether a model can solve real GitHub issues by editing repository code and passing the hidden tests. In this run, Qwopus-3.6-27B-Coder solved <b>335 out of 500</b> verified tasks while running in <b>no-thinking mode</b>, prioritizing direct action quality and local speed over long explicit reasoning.</p>

<thead>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; color: #6d28d9; font-weight: 800;">Metric</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-weight: 800;">Result</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-weight: 800;">Notes</th>

</tr>

</thead>

<tbody>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Final score</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Full SWE-bench Verified 500-task split</td>

</tr>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #0369a1;">Thinking off</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">No long visible chain-of-thought during evaluation</td>

</tr>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Quantization</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Local 27B quantized deployment</td>

</tr>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Throughput</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #047857;">~100 tokens/sec</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Observed on RTX 5090 with MTP enabled</td>

</tr>

</tbody>

</table>

</div>

<span>🧩</span> 3.2 Repository-Level Breakdown

</div>

<p style="margin: 0 0 12px 0;">The result is strongest on practical library-maintenance tasks such as scikit-learn, xarray, requests, and Django, while also showing solid coverage on symbolic mathematics, test infrastructure, documentation tooling, and plotting libraries.</p>

<div style="background: rgba(14, 165, 233, 0.07); padding: 9px 10px; border-bottom: 2px solid #0ea5e9; color: #0369a1; font-weight: 800;">Repository</div>

<div style="background: rgba(14, 165, 233, 0.07); padding: 9px 10px; border-bottom: 2px solid #0ea5e9; font-weight: 800;">Resolved</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">scikit-learn</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">27/32</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #047857;">84%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">pydata/xarray</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">18/22</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #047857;">82%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">psf/requests</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">6/8</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #047857;">75%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">django</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">166/231</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #047857;">72%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">sympy</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">48/75</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #6d28d9;">64%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">pytest</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">12/19</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #6d28d9;">63%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">sphinx-doc</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">26/44</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #6d28d9;">59%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">matplotlib</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">20/34</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #6d28d9;">59%</div>

<div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">astropy</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">9/22</div><div style="padding: 9px 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #b45309;">41%</div>

<div style="padding: 9px 10px; font-weight: 700;">pylint</div><div style="padding: 9px 10px;">2/10</div><div style="padding: 9px 10px; font-weight: 800; color: #b45309;">20%</div>

</div>

<span>⚖️</span> 3.3 SWE-bench Verified Reference Comparison

</div>

<b>Important comparison note:</b> the reference scores below are from external model reports and are generally <b>thinking-enabled</b> or harness-specific where noted. Qwopus-3.6-27B-Coder is shown here as a <b>no-thinking</b>, quantized local run, so this table should be read as positioning context rather than a strict same-mode leaderboard.

</div>

<thead>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; color: #6d28d9; font-weight: 800;">Model</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-weight: 800;">Thinking Mode</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-weight: 800;">SWE-bench Verified</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-weight: 800;">Context</th>

</tr>

</thead>

<tbody>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 900; color: #5b21b6;">Qwopus-3.6-27B-Coder</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800; color: #0369a1;">Off / No-thinking</td>

</tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">OpenAI GPT-5</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">70.1</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">OpenAI GPT-5 mini</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">59.8</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">OpenAI GPT-5 nano</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">34.8</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">GLM-4.7</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">70.6</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">OpenHands reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">GLM-4.5-Air</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">57.6</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">OpenHands reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Qwen3-Coder-30B-A3B-Instruct (2025-07)</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Off / No-thinking</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">70.3</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">No-thinking reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Claude 4.0 Opus</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">67.6</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Claude 4.5 Opus</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">80.9</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Qwen3.6-27B</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">77.2</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Qwen3.5-397B-A17B</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">76.2</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Qwen3.5-27B</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">75.0</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Qwen3.6-35B-A3B</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">73.4</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 700;">Gemma4-31B</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">On</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: 800;">52.0</td><td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15);">Thinking-on reference</td></tr>

<tr><td style="padding: 10px; font-weight: 700;">Gemma4-26B-A4B</td><td style="padding: 10px;">On</td><td style="padding: 10px; font-weight: 800;">17.4</td><td style="padding: 10px;">Thinking-on reference</td></tr>

</tbody>

</table>

</div>

<span>🎮</span> 3.4 Live Thinking-Disabled Demo: Boat Survival

</div>

<p style="margin: 0 0 12px 0;">Kyle Hessling also tested Qwopus-3.6-27B-Coder in a small interactive game environment with thinking disabled. The demo is a practical smoke test for fast decision-making, instruction adherence, and local responsiveness beyond static benchmark tables.</p>

<a href="https://huggingface.co/spaces/KyleHessling1/Boat-Survival-Thinking-Disabled-Qwopus-3.6-27B-Coder" target="_blank" style="display: block; padding: 12px 14px; border: 1px solid #bae6fd; border-radius: 8px; background: #f0f9ff; color: #0369a1; text-decoration: none; font-weight: 800;">Open the Hugging Face Space</a>

<a href="https://x.com/KyleHessling1/status/2064449362382758354" target="_blank" style="display: block; padding: 12px 14px; border: 1px solid #bbf7d0; border-radius: 8px; background: #f0fdf4; color: #047857; text-decoration: none; font-weight: 800;">View Kyle's reference post</a>

</div>

</a>

</div>

<b>Takeaway:</b> The headline is not that this no-thinking local run beats every thinking-enabled frontier reference. The important result is that a quantized 27B local coder can reach <b>67.0%</b> on the full SWE-bench Verified split while staying fast enough for interactive agent loops. This makes Qwopus-3.6-27B-Coder a practical option for developers who want strong repository-level repair performance without paying the latency cost of long reasoning mode.

</div>

---

🗺️ 4. Training & Data Pipeline Overview

The training process fuses Trace Inversion data augmentation with a Three-Stage Curriculum Learning pipeline. The core engineering focuses on expanding context length gradually while training on reconstructed reasoning traces and real agent trajectories to keep the output format stable.

       [ 🗺️ Trace Inversion: Reconstructing Distillation Workflow ]

  A. Surrogate Model Training (Trace Inverter)
     Open-source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
                                              │                                   │
                                              └──────────► [ Training ] ◄─────────┘
                                                   (Base: Qwen3-4B-Instruct)
                                                   (Result: Trace-Inverter-4B)

  B. Inversion Phase: Reconstructing Claude-4.7-Max
     _______________________________________________________
    |                                                       |
    |  Claude-4.7-Max API ──► Compressed Bubbles + Answer   |
    |_______________________________________________________|
                      │
                      ▼
    [ 🧠 Trace-Inverter-4B (Logic Reconstructor) ] ──► Synthetic Deep Reasoning Trace (Learnable CoT)
                      │
                      ▼
    [ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
    (Embed reconstructed CoT in <think> tags, splicing with original prompt/response)
                      │
                      ▼
             (Result: claude-opus-4.6/4.7 inverted sets)

  C. Final Coder SFT Curriculum Pipeline
     ___________________________________________
    |                                           |
    |       Base Model (Qwopus3.6-27B-v2)       |
    |___________________________________________|
                      │
                      ▼
    [ 📦 Phase 1: Format Inception ] ──► [ 🛠️ Phase 2: Agent/Coding Expansion ] ──► [ 🚀 Phase 3: Long-Context SFT ]
      ( < 4096 tokens )                     ( 4096 - 8192 tokens )                     ( 8192 - 32K tokens )
      (Stable <think> format)               (Tool traces + coding tasks)               (Long / multi-turn / replay)
                      │                                                                            │
                      └─────────────────────────────┬──────────────────────────────────────────────┘
                                                    ▼
                                   _______________________________________________
                                  |                                               |
                                  |   🌟 Final Model: Qwopus-3.6-27B-Coder        |
                                  |_______________________________________________|

> [!NOTE]

> Due to the complex and diverse format of agent trajectory datasets, rigorous cleaning and format standardization were applied to ensure data quality.

---

📚 5. Three-Stage Curriculum Learning

To steadily scale reasoning quality under long-context inference, Qwopus-3.6-27B-Coder uses a curriculum-style data mixture building on the approach proven in the Qwopus coder line. The model is first stabilized on short, clean reasoning samples, then exposed to complex coding and agent traces, and finally reinforced with longer contexts plus replay data.

<thead>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; color: #7c3aed; font-size: 14px; width: 25%;">Curriculum Stage</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-size: 14px; width: 35%;">Focus & Sample Characteristics</th>

<th style="padding: 10px; border-bottom: 2px solid #7c3aed; text-align: left; font-size: 14px;">Strategy Details</th>

</tr>

</thead>

<tbody>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold; font-size: 13px; color: #7c3aed;">📦 Stage 1: Format Inception</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">• Limit context within 4,096 tokens<br>• Emphasize stable reasoning templates</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish reliable structured reasoning output, including stable <code><think></code> boundaries, before exposing the model to longer chains.</td>

</tr>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold; font-size: 13px; color: #7c3aed;">🛠️ Stage 2: Complexity Expansion</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">• Extend length to 4,096 - 8,192 tokens<br>• Introduce higher-difficulty coding and agent samples</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">Gradually increases the ratio of complex reasoning chains, code debugging tasks, and multi-turn tool traces. The model learns to connect reasoning, action selection, and environment feedback.</td>

</tr>

<tr>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-weight: bold; font-size: 13px; color: #7c3aed;">🚀 Stage 3: Long-Context SFT</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">• Progressively scale samples up to 32K tokens<br>• Use short-sample replay</td>

<td style="padding: 10px; border-bottom: 1px solid rgba(128,128,128,0.15); font-size: 13px;">Pushes the model toward long-context and multi-turn reasoning while replaying high-quality short samples to reduce instruction-following drift. The 32K figure describes the fine-tuning sequence/data mixture target, not a hard architectural limit.</td>

</tr>

</tbody>

</table>

---

🎯 6. Recommended Use Cases & Known Limits

<span>✅</span> Good Fits

</div>

Agentic code generation and repository-level debugging, complex tool-call orchestration, structured multi-step reasoning, code review and patch generation, DevOps scripting and automation, and any workflow requiring deep logical reasoning combined with tool execution.

</div>

<span>❌</span> Known Limits

</div>

As a specialized coder model, it has not undergone comprehensive general-domain safety evaluation. Capability decay may occur in non-coding or non-agent tasks. Tool-call behavior depends strongly on prompt format and tool schema consistency. Long-context performance beyond 32K may require RoPE/YaRN scaling.

</div>

> [!CAUTION]

> Deployment note: The model may emit reasoning inside <think> and </think> tags. Front-end applications and agent frameworks should parse or hide these sections where appropriate. For tool calling, ensure the prompt format and system prompt match the training data configuration to activate agent capabilities.

---

⚠️ 7. Training & Deployment Notes

> [!CAUTION]

> Compatibility Notes

> - Tool Calling Format: To activate the model's agent capabilities, ensure the prompt format and system prompt include appropriate tool definitions and match the training data format.

> - Reasoning Output Extraction: The model's thinking process is wrapped in <think> and </think> tags. Front-end applications may need to parse and hide these tags.

> - Long-Context Usage: For contexts beyond 32K, consider enabling RoPE/YaRN scaling (e.g., --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 in llama.cpp).

---

📋 8. Benchmark Progress

The first completed evaluation is the no-thinking SWE-bench Verified run reported above. Additional local agentic benchmarks remain pending and will be added after testing.

| Benchmark | Status | Result / Reference |

|-----------|--------|-------------------|

| SWE-bench Verified | ✅ Completed | 335/500 = 67.0% (thinking-off, Q5_K_M, RTX 5090 + MTP) |

| BugFind-15 | 📋 Pending | 9B reference: 79 |

| HermesAgent-20 | 📋 Pending | 9B reference: 85 |

| ToolCall-15 | 📋 Pending | 9B reference: 100 |

| InstructFollow-15 | 📋 Pending | 9B reference: 93 |

---

📚 9. Resources & Guides

👉 GitHub Repository: Jackrong-llm-finetuning-guide

Access the repository to dive into the codebase and reproduce our results.

👉 Qwen MTP GGUF Processing Workflow

A custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads.

👉 benchlocal Evaluation Framework

The evaluation framework used to run the local agentic and coding benchmarks.

👉 Qwopus3.6-27B-v2 Model Card

Base model card with full MMLU-Pro, SWE-bench, and throughput benchmarks.

---

🙏 10. Acknowledgements

Special thanks to:

The Qwen team for providing the powerful Qwen3.6-27B base model.
Unsloth for providing the highly efficient fine-tuning framework.
Kyle Hessling for the close collaboration on hardware, training infrastructure, and evaluation support.
Open-source datasets and community contributors, particularly lambda/hermes-agent-reasoning-traces for the high-quality agent trajectory data.

---

📖 11. Citation

@misc{jackrong_qwopus36_27b_coder,
  title        = {Qwopus-3.6-27B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-27B-Coder}}
}

Run Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models