LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF overview
⚡ https://web.tribute.tg/d/KIH https://web.tribute.tg/d/KIH ⚡ If you like this Genesis LLM release you can donate https://web.tribute.tg/d/KIH to me via @Tribu…
Runs locally from ~857.6 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-Compact.gguf | GGUF | GGUF | 16.14 GB | Download |
| Qwen3.6-35B-A3B-Uncensored-Genesis-APEX.gguf | GGUF | GGUF | 23.87 GB | Download |
| Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-APEX-Compact.gguf | GGUF | GGUF | 16.78 GB | Download |
| Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-APEX.gguf | GGUF | GGUF | 24.63 GB | Download |
| Qwen3.6-35B-A3B-Uncensored-Genesis-MTP-Q8_K_P.gguf | GGUF | Q8_K_P | 41.45 GB | Download |
| Qwen3.6-35B-A3B-Uncensored-Genesis-Q8_K_P.gguf | GGUF | Q8_K_P | 40.61 GB | Download |
| mmproj-Qwen3.6-35B-A3B-Uncensored-Genesis-f16.gguf | GGUF | F16 | 857.6 MB | Download |
Model Details
| Model ID | LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF |
|---|---|
| Author | LuffyTheFox |
| Pipeline | image-text-to-text |
| License | apache-2.0 |
| Base model | HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive |
| Last modified | 2026-06-08T04:31:46.000Z |
Model README
---
license: apache-2.0
tags:
- uncensored
- qwen3.6
- moe
- gguf
- vision
- multimodal
- genesis
language:
- en
- zh
- multilingual
pipeline_tag: image-text-to-text
base_model:
- HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
---
> ⚡ https://web.tribute.tg/d/KIH ⚡ If you like this Genesis LLM release you can donate to me via @Tribute bot in Telegram messenger and support future Genesis LLM development.
🌟 Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive -> Genesis-V2
> Key difference from Wasserstein release and old Genesis release is data regeneration in model via mathematical statistics based on what it's already learned and stored in tensors. I regenerated even more dead blocks from data in healthy blocks in this version.
> Join the Discord for updates, roadmaps, projects, or just to chat.
Base model. HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive- 0/465 refusals.
Thanks to HauhauCS
Usage
Ready to use. Recommended quant: APEX or MTP-APEX
On my RTX 3060 12GB and regular chatting, I have more tokens per second without MTP.
Tensor drift repair by me. Method: Sig-ScaleSync-Wasserstein
LLM models often have:
- Saturated weights: the model's activations are stuck, gradients vanish, outputs degrade
- Scale mismatches: one layer's weights are 10× larger than its peers for no good reason
- Mean drift: weight distributions shifted positive or negative, breaking symmetry assumptions
My approach fixes all of that without retraining - pure numerical surgery on the raw bytes of the file.
Quantization script available here: https://pastebin.com/hXhcMJn9
Feel free to do your own quants if you want.
Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive: Diagnostic & Repair Summary
| Metric | Value |
|--------|-------|
| Weight tensors analyzed | 500 |
| Healthy (all criteria) | 497 |
| Repaired (C2 – scale misalignment) | 3 |
| Skipped | 233 |
Repair Effectiveness
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| S (saturation error) | 0.0023 | 0.0008 | 63.7% |
| W1 (Wasserstein‑1) | 0.0035 | 0.0008 | 76.2% |
Scale correction factors (α): min = 0.577, mean = 0.602, max = 0.653.
Repaired Tensors
All three are ssm_conv1d.weight layers – recurrent state transition layers responsible for long‑context memory.
| Tensor | α | D (log‑ratio) | W1 before | W1 after |
|--------|---|---------------|-----------|----------|
| blk.36.ssm_conv1d.weight | 0.5765 | 0.553 | 0.0038 | 0.0009 |
| blk.37.ssm_conv1d.weight | 0.5768 | 0.725 | 0.0040 | 0.0009 |
| blk.38.ssm_conv1d.weight | 0.6533 | 0.649 | 0.0026 | 0.0006 |
Interpretation: All three layers were too loud (σ_w > σ_med by 50–100%). Scale correction restored them to peer median. W1 dropped by ≈80%, confirming distribution shape normalized.
---
Verdict: Model is clinically healthy. 497 out of 500 weight tensors passed all four criteria. Three SSM layers repaired successfully. No saturation, no W1 drift, no ReLU asymmetry. Ready for use.
---
Links:
---
Wanna fix your GGUF model?
Contact: luffythefox@mail.ru
My Telegram: @LuffyTheFox
🌟 Recommended Settings (LM Studio)
Set K Cache Quantization Type and V Cache Quantization Type in advanced model loading settings to Q8_0 or F16.
Chat template: chat_template.jinja
Chat template: chat_template_thinking.jinja
| Parameter | Value |
|-----------|-------|
| Temperature | 0.7 |
| Top K Sampling | 20 |
| Presence Penalty| 1.5 |
| Repeat Penalty| 1.0 |
| Top P Sampling | 0.8 |
| Min P Sampling | 0 |
| Seed | 42 |
System prompt: System_Prompt.txt
Or use this minimal string as the first line:
> You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
Then add anything you want after.
About
No changes to datasets or capabilities. Fully functional - 100% of what the original authors intended, just without refusals and with the critical architecture bug fixed on output layers.
These are meant to be the best lossless uncensored models out there.
---
Specs
- 35B total parameters, ~3B active per forward pass (MoE)
- 256 experts, 8 routed + 1 shared per token
- Hybrid architecture: Gated DeltaNet linear attention + full softmax attention (3:1 ratio)
- 40 layers, pattern: 10 × (3 × DeltaNet-MoE + 1 × Attention-MoE)
- 262K native context (extendable to 1M with YaRN)
- Natively multimodal (text, image, video)
- Multi-token prediction (MTP) support
- 248K vocabulary, 201 languages
- Base model. HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
---
Recommended Settings (Official Qwen Authors)
Thinking mode (default):
- General:
temperature=1.0, top_p=0.95, top_k=20, min_p=0, presence_penalty=1.5 - Coding/precise tasks:
temperature=0.6, top_p=0.95, top_k=20, min_p=0, presence_penalty=0
Non-thinking mode:
- General:
temperature=0.7, top_p=0.8, top_k=20, min_p=0, presence_penalty=1.5 - Reasoning tasks:
temperature=1.0, top_p=1.0, top_k=40, min_p=0, presence_penalty=2.0
Important:
- Keep at least 128K context to preserve thinking capabilities
- Use
--jinjaflag with llama.cpp for proper chat template handling - Vision support requires the
mmprojfile alongside the main GGUF
---
Compatibility
Works with llama.cpp, LM Studio, koboldcpp, and other GGUF-compatible runtimes.
Run LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-V2-APEX-MTP-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models