cahlen/qwen3.5-35b-a3b-uncensored-hauhaucs-aggressive-gguf IQ2_XXS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
cahlen/qwen3.5-35b-a3b-uncensored-hauhaucs-aggressive-gguf overview
Full llama.cpp quantization ladder for HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive. K-quants from Q80 through Q4 use standard llama-quantize without an importance matrix. Low-bit Q3KL / Q3KM / Q3KS, Q2K, and all IQ* types use WikiText-2 importance-matrix calibration (200 chunks) when this workspace contains imatrix.dat.
Downloads
5,989
Likes
1
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
11 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ1_M.gguf | GGUF | IQ1_M | 7.67 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_S.gguf | GGUF | IQ2_S | 9.92 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_XXS.gguf | GGUF | IQ2_XXS | 8.85 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ3_S.gguf | GGUF | IQ3_S | 14.20 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ3_XXS.gguf | GGUF | IQ3_XXS | 12.69 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q2_K.gguf | GGUF | Q2_K | 12.05 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q3_K_L.gguf | GGUF | Q3_K_L | 16.87 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q3_K_S.gguf | GGUF | Q3_K_S | 14.14 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_S.gguf | GGUF | Q4_K_S | 18.52 GB | Download |
| Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q5_K_S.gguf | GGUF | Q5_K_S | 22.33 GB | Download |
| mmproj-Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf | GGUF | F16 | 857.62 MB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"base_model": "HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
"tags": [
"gguf",
"quantized",
"llama-cpp",
"qwen",
"qwen3.5",
"moe",
"vision",
"multimodal",
"uncensored"
],
"license": "apache-2.0",
"language": [
"en",
"zh"
],
"frontmatter": {
"base_model": "HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
"tags": [
"gguf",
"quantized",
"llama-cpp",
"qwen",
"qwen3.5",
"moe",
"vision",
"multimodal",
"uncensored"
],
"license": "apache-2.0",
"language": [
"en",
"zh"
]
},
"hero_image_url": "",
"summary": "Full **llama.cpp** quantization ladder for HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive. **K-quants** from Q8_0 through Q4 use standard llama-quantize without an importance matrix. Low-bit Q3_K_L / Q3_K_M / Q3_K_S, Q2_K, and all IQ* types use **WikiText-2** importance-matrix calibration (200 chunks) when this workspace contains imatrix.dat.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nbase_model: HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive\ntags:\n - gguf\n - quantized\n - llama-cpp\n - qwen\n - qwen3.5\n - moe\n - vision\n - multimodal\n - uncensored\nlicense: apache-2.0\nlanguage:\n - en\n - zh\n---\n\n# Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-GGUF\n\nFull **llama.cpp** quantization ladder for [`HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive`](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive). **K-quants** from Q8_0 through Q4 use standard `llama-quantize` without an importance matrix. Low-bit `Q3_K_L` / `Q3_K_M` / `Q3_K_S`, `Q2_K`, and all `IQ*` types use **WikiText-2** importance-matrix calibration (200 chunks) when this workspace contains `imatrix.dat`.\n\n## About the Source Model\n\nThis repo is a **GGUF quantization ladder** for [`HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive`](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive): an **Aggressive** uncensored build based on [`Qwen/Qwen3.5-35B-A3B`](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) (MoE, multimodal, long context). Low-bit K-quants (Q3_K*, Q2_K) and IQ-types use an importance matrix when `imatrix.dat` was produced in this run—same spirit as [our compacted Qwen3.5 GGUF ladder](https://huggingface.co/cahlen/qwen3.5-35b-a3b-compacted-GGUF).\n\nFor refusal behavior, recommended sampling settings, and **mmproj** vision tensors, follow the [HauhauCS model card](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive) and Qwen docs. **Note:** LM Studio may show `256×2.6B` in the params column; HauhauCS reports this is a cosmetic metadata quirk.\n\n### Complementary files (read this if a quant is missing here)\n\nThe [HauhauCS weight index](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive) already hosts **BF16**, **Q8_0** through **Q6_K**, several **Q4/Q5** variants, **IQ4_XS**, **IQ3_M**, **IQ2_M**, **Q3_K_M**, etc. This **cahlen** companion repo (HF names end with **-GGUF**) is **disk-aware**: it adds the *extra* ladder rungs we use on constrained hardware (e.g. **Q5_K_S**, **Q4_K_S**, **Q3_K_L** / **Q3_K_S**, **Q2_K**, **IQ3_S**, **IQ3_XXS**, **IQ2_S**, **IQ2_XXS**, **IQ1_M**) with the same **WikiText-2 / 200-chunk** imatrix workflow as [cahlen/qwen3.5-35b-a3b-compacted-GGUF](https://huggingface.co/cahlen/qwen3.5-35b-a3b-compacted-GGUF). Pull from HauhauCS if you need a size we do not mirror here.\n\n## Available Quantizations\n\n| Filename | Quant | Size | Notes |\n|----------|-------|------|-------|\n| Q5_K_S | Q5_K_S | 23G | K-quant |\n| Q4_K_S | Q4_K_S | 19G | K-quant |\n| Q3_K_L | Q3_K_L | 17G | imatrix |\n| Q3_K_S | Q3_K_S | 15G | imatrix |\n| IQ3_S | IQ3_S | 15G | imatrix |\n| IQ3_XXS | IQ3_XXS | 13G | imatrix |\n| Q2_K | Q2_K | 13G | imatrix |\n| IQ2_S | IQ2_S | 10G | imatrix |\n| IQ2_XXS | IQ2_XXS | 8.9G | imatrix |\n| IQ1_M | IQ1_M | 7.7G | imatrix |\n| mmproj-...-f16.gguf | mmproj (vision) | 858M | Pair with any quant above |\n\nAll filenames are prefixed with `Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-`. The BF16 baseline (65G) was used locally for quantization but is **not** uploaded to save space; grab it from the [HauhauCS source repo](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive) if needed. \"imatrix\" rows used WikiText-2 importance-matrix calibration (200 chunks).\n\n\n\n## Quality (WikiText-2 Perplexity)\n\nLower is better. First row is the unquantized baseline.\n\n| Quant | Size | Perplexity | vs Baseline |\n|-------|------|-----------|-------------|\n| BF16 (baseline) | 65G | 6.4393 | — |\n| Q5_K_S | 23G | 6.4871 | +0.7% |\n| Q4_K_S | 19G | 6.6214 | +2.8% |\n| Q3_K_L | 17G | 6.7204 | +4.4% |\n| IQ3_S | 15G | 6.7631 | +5.0% |\n| Q3_K_S | 15G | 6.9724 | +8.3% |\n| IQ3_XXS | 13G | 7.0490 | +9.5% |\n| Q2_K | 13G | 7.4896 | +16.3% |\n| IQ2_S | 10G | 8.1019 | +25.8% |\n| IQ2_XXS | 8.9G | 9.0738 | +40.9% |\n| IQ1_M | 7.7G | 11.1425 | +73.0% |\n\nMeasured with `llama-perplexity` on the WikiText-2 test set (580 chunks, context 512). BF16 baseline evaluated on CPU; quantized variants on NVIDIA RTX 5090.\n\n## How to Use\n\n### With llama.cpp (text)\n```bash\nllama-cli -m Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_S.gguf --jinja -c 131072 -ngl 99 -p \"Hello\"\n```\n\n### With llama.cpp (vision)\n```bash\nllama-cli -m Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_S.gguf \\\n --mmproj mmproj-Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf \\\n --jinja -c 131072 -ngl 99\n```\n\n### With llama-server\n```bash\nllama-server -m Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_S.gguf --jinja -c 131072 -ngl 99\n```\n\n### With Ollama\n```bash\nollama run hf.co/cahlen/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-GGUF:Q4_K_S\n```\n\n### With LM Studio\nDownload any GGUF from the table and load it.\n\n## Choosing a Quant\n\nRough **disk size / VRAM** guidance (actual usage varies by context length and loader). Quants marked ★ are in **this repo**; others are on the [HauhauCS source repo](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive).\n\n| Your VRAM | Try | Size |\n|-----------|-----|------|\n| 24GB+ | Q8_0 or Q6_K (HauhauCS) | largest |\n| 16GB | ★ Q5_K_S / ★ Q4_K_S | 19–23G |\n| 12GB | ★ Q3_K_L / ★ IQ3_S | 15–17G |\n| 8GB | ★ IQ3_XXS / ★ Q2_K | 13G |\n| 6GB | ★ IQ2_S / ★ IQ2_XXS | 8.9–10G |\n\n## Quantization Details\n\n- **Quantized by**: [cahlen](https://huggingface.co/cahlen)\n- **Importance matrix**: WikiText-2 (`wikitext-2-raw-v1`, 200 chunks), when generated for this run\n- **Tool**: [llama.cpp](https://github.com/ggml-org/llama.cpp) @ `59d840209`\n- **Hardware**: NVIDIA RTX 5090 32GB / Intel Core Ultra 9 285K / 188GB RAM\n",
"related_quantizations": []
},
"tags": [
"gguf",
"quantized",
"llama-cpp",
"qwen",
"qwen3.5",
"moe",
"vision",
"multimodal",
"uncensored",
"en",
"zh",
"base_model:HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
"base_model:quantized:HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 1,
"downloads": 5989,
"gated": false,
"private": false,
"last_modified": "2026-04-03T18:30:51.000Z",
"created_at": "2026-04-02T22:39:59.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69cef03f9c435d0ffaf5c73f",
"id": "cahlen/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-GGUF",
"modelId": "cahlen/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-GGUF",
"sha": "01b3d25eae945154168f7f0c821571d68d783345",
"createdAt": "2026-04-02T22:39:59.000Z",
"lastModified": "2026-04-03T18:30:51.000Z",
"author": "cahlen",
"downloads": 5989,
"likes": 1,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 13
}