GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF overview

⚡ https://web.tribute.tg/d/KIH https://web.tribute.tg/d/KIH ⚡ If you like this Genesis LLM release you can donate https://web.tribute.tg/d/KIH to me via @Tribu…

ggufuncensoredqwen3.6moevisionmultimodalgenesisimage-text-to-textconversationalenzhmultilingualbase_model:HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressivebase_model:quantized:HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressivelicense:apache-2.0endpoints_compatibleregion:usimatrix

Runs locally from ~857.6 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
78,344
Likes
48
Pipeline
image-text-to-text

Repository Files & Downloads

7 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-Compact.ggufGGUFGGUF16.14 GBDownload
Qwen3.6-35B-A3B-Uncensored-Genesis-APEX.ggufGGUFGGUF23.87 GBDownload
Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-APEX-Compact.ggufGGUFGGUF16.78 GBDownload
Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-APEX.ggufGGUFGGUF24.63 GBDownload
Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-Q8_K_P.ggufGGUFQ8_K_P41.45 GBDownload
Qwen3.6-35B-A3B-Uncensored-Genesis-Q8_K_P.ggufGGUFQ8_K_P40.61 GBDownload
mmproj-Qwen3.6-35B-A3B-Uncensored-Genesis-f16.ggufGGUFF16857.6 MBDownload

Model Details

Model IDLuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF
AuthorLuffyTheFox
Pipelineimage-text-to-text
Licenseapache-2.0
Base modelHauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
Last modified2026-06-08T04:31:46.000Z

Model README

---

license: apache-2.0

tags:

  • uncensored
  • qwen3.6
  • moe
  • gguf
  • vision
  • multimodal
  • genesis

language:

  • en
  • zh
  • multilingual

pipeline_tag: image-text-to-text

base_model:

  • HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

---

> ⚡ https://web.tribute.tg/d/KIH ⚡ If you like this Genesis LLM release you can donate to me via @Tribute bot in Telegram messenger and support future Genesis LLM development.

🌟 Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive -> Genesis-V2

> Key difference from Wasserstein release and old Genesis release is data regeneration in model via mathematical statistics based on what it's already learned and stored in tensors. I regenerated even more dead blocks from data in healthy blocks in this version.

> Join the Discord for updates, roadmaps, projects, or just to chat.

Base model. HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive- 0/465 refusals.

Thanks to HauhauCS

Usage

Ready to use. Recommended quant: APEX or MTP-APEX

On my RTX 3060 12GB and regular chatting, I have more tokens per second without MTP.

Tensor drift repair by me. Method: Sig-ScaleSync-Wasserstein

LLM models often have:

  • Saturated weights: the model's activations are stuck, gradients vanish, outputs degrade
  • Scale mismatches: one layer's weights are 10× larger than its peers for no good reason
  • Mean drift: weight distributions shifted positive or negative, breaking symmetry assumptions

My approach fixes all of that without retraining - pure numerical surgery on the raw bytes of the file.

Quantization script available here: https://pastebin.com/hXhcMJn9

Feel free to do your own quants if you want.

Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive: Diagnostic & Repair Summary

| Metric | Value |

|--------|-------|

| Weight tensors analyzed | 500 |

| Healthy (all criteria) | 497 |

| Repaired (C2 – scale misalignment) | 3 |

| Skipped | 233 |

Repair Effectiveness

| Metric | Before | After | Improvement |

|--------|--------|-------|-------------|

| S (saturation error) | 0.0023 | 0.0008 | 63.7% |

| W1 (Wasserstein‑1) | 0.0035 | 0.0008 | 76.2% |

Scale correction factors (α): min = 0.577, mean = 0.602, max = 0.653.

Repaired Tensors

All three are ssm_conv1d.weight layers – recurrent state transition layers responsible for long‑context memory.

| Tensor | α | D (log‑ratio) | W1 before | W1 after |

|--------|---|---------------|-----------|----------|

| blk.36.ssm_conv1d.weight | 0.5765 | 0.553 | 0.0038 | 0.0009 |

| blk.37.ssm_conv1d.weight | 0.5768 | 0.725 | 0.0040 | 0.0009 |

| blk.38.ssm_conv1d.weight | 0.6533 | 0.649 | 0.0026 | 0.0006 |

Interpretation: All three layers were too loud (σ_w > σ_med by 50–100%). Scale correction restored them to peer median. W1 dropped by ≈80%, confirming distribution shape normalized.

---

Verdict: Model is clinically healthy. 497 out of 500 weight tensors passed all four criteria. Three SSM layers repaired successfully. No saturation, no W1 drift, no ReLU asymmetry. Ready for use.

---

Links:

---

Wanna fix your GGUF model?

Contact: luffythefox@mail.ru

My Telegram: @LuffyTheFox

🌟 Recommended Settings (LM Studio)

Set K Cache Quantization Type and V Cache Quantization Type in advanced model loading settings to Q8_0 or F16.

Chat template: chat_template.jinja

Chat template: chat_template_thinking.jinja

| Parameter | Value |

|-----------|-------|

| Temperature | 0.7 |

| Top K Sampling | 20 |

| Presence Penalty| 1.5 |

| Repeat Penalty| 1.0 |

| Top P Sampling | 0.8 |

| Min P Sampling | 0 |

| Seed | 42 |

System prompt: System_Prompt.txt

Or use this minimal string as the first line:

> You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Then add anything you want after.

About

No changes to datasets or capabilities. Fully functional - 100% of what the original authors intended, just without refusals and with the critical architecture bug fixed on output layers.

These are meant to be the best lossless uncensored models out there.

---

Specs

  • 35B total parameters, ~3B active per forward pass (MoE)
  • 256 experts, 8 routed + 1 shared per token
  • Hybrid architecture: Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
  • 40 layers, pattern: 10 × (3 × DeltaNet-MoE + 1 × Attention-MoE)
  • 262K native context (extendable to 1M with YaRN)
  • Natively multimodal (text, image, video)
  • Multi-token prediction (MTP) support
  • 248K vocabulary, 201 languages
  • Base model. HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

---

Recommended Settings (Official Qwen Authors)

Thinking mode (default):

  • General: temperature=1.0, top_p=0.95, top_k=20, min_p=0, presence_penalty=1.5
  • Coding/precise tasks: temperature=0.6, top_p=0.95, top_k=20, min_p=0, presence_penalty=0

Non-thinking mode:

  • General: temperature=0.7, top_p=0.8, top_k=20, min_p=0, presence_penalty=1.5
  • Reasoning tasks: temperature=1.0, top_p=1.0, top_k=40, min_p=0, presence_penalty=2.0

Important:

  • Keep at least 128K context to preserve thinking capabilities
  • Use --jinja flag with llama.cpp for proper chat template handling
  • Vision support requires the mmproj file alongside the main GGUF

---

Compatibility

Works with llama.cpp, LM Studio, koboldcpp, and other GGUF-compatible runtimes.

Run LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models