keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf IQ5_KS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf overview

🚨 CRITICAL COMPATIBILITY WARNING 🚨 These are iqk format quantizations and are EXCLUSIVE to the ikllama.cpp fork. They will NOT work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using ikawrakow's llama.cpp fork (or a UI where you have manually swapped the backend to an ikllama build). --- This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for wangzhang/Qwen3.5-122B-A10B-abliterix, an abliterated version of Qwen/Qwen3.5-122B-A10B. These quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.

ggufquantizationiq4_ksiq4_kiq4_kssiq5_kiq5_ksiq6_kiq2_klik_llama.cppqwenqwen3_5_moeabliteratedtext-generationbase_model:wangzhang/Qwen3.5-122B-A10B-abliterixbase_model:quantized:wangzhang/Qwen3.5-122B-A10B-abliterixendpoints_compatibleregion:usimatrixconversational

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf visual

Downloads

2,710

Likes

Pipeline

text-generation

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

7 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3.5-122B-A10B-abliterix-IQ2_KL.gguf	GGUF	IQ2_KL	43.33 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ4_K.gguf	GGUF	IQ4_K	66.95 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ4_KS.gguf	GGUF	IQ4_KS	63.48 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ4_KSS.gguf	GGUF	IQ4_KSS	61.23 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ5_K.gguf	GGUF	IQ5_K	80.49 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ5_KS.gguf	GGUF	IQ5_KS	77.35 GB	Download
Qwen3.5-122B-A10B-abliterix-IQ6_K.gguf	GGUF	IQ6_K	95.68 GB	Download

Model Details Live

Model Slug

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf

Author

KeinNiemand

Pipeline Task

text-generation

Library

—

Created

2026-04-14

Last Modified

2026-04-14

Gated

Private

HF SHA

ac07c2738cd9d56440df329f46599f58a6450c33

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": "wangzhang/Qwen3.5-122B-A10B-abliterix",
    "tags": [
      "gguf",
      "quantization",
      "iq4_ks",
      "iq4_k",
      "iq4_kss",
      "iq5_k",
      "iq5_ks",
      "iq6_k",
      "iq2_kl",
      "ik_llama.cpp",
      "qwen",
      "qwen3_5_moe",
      "abliterated"
    ],
    "pipeline_tag": "text-generation",
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "🚨 **CRITICAL COMPATIBILITY WARNING** 🚨 **These are iqk format quantizations and are EXCLUSIVE to the ik_llama.cpp fork.** They will **NOT** work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using ikawrakow's llama.cpp fork (or a UI where you have manually swapped the backend to an ik_llama build). --- This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for wangzhang/Qwen3.5-122B-A10B-abliterix, an abliterated version of Qwen/Qwen3.5-122B-A10B. These quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\r\nbase_model: wangzhang/Qwen3.5-122B-A10B-abliterix\r\ntags:\r\n- gguf\r\n- quantization\r\n- iq4_ks\r\n- iq4_k\r\n- iq4_kss\r\n- iq5_k\r\n- iq5_ks\r\n- iq6_k\r\n- iq2_kl\r\n- ik_llama.cpp\r\n- qwen\r\n- qwen3_5_moe\r\n- abliterated\r\npipeline_tag: text-generation\r\n---\r\n\r\n# Qwen3.5 122B A10B Abliterix - Custom GGUF Quantizations\r\n\r\n🚨 **CRITICAL COMPATIBILITY WARNING** 🚨\r\n**These are `iqk` format quantizations and are EXCLUSIVE to the `ik_llama.cpp` fork.** They will **NOT** work on mainline `llama.cpp`, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using [ikawrakow's llama.cpp fork](https://github.com/ikawrakow/ik_llama.cpp) (or a UI where you have manually swapped the backend to an `ik_llama` build).\r\n\r\n---\r\n\r\nThis repository contains custom, mixed-precision `ik_llama.cpp` GGUF quantizations for [wangzhang/Qwen3.5-122B-A10B-abliterix](https://huggingface.co/wangzhang/Qwen3.5-122B-A10B-abliterix), an abliterated version of [Qwen/Qwen3.5-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-122B-A10B).\r\n\r\nThese quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.\r\n\r\n## ⚠️ Disclaimer: The \"Vibes Test\"\r\n**These quantizations have NOT been formally tested for perplexity.** They were compiled as an experiment to see how the model handles shifting bottlenecks. There is no guarantee that they are mathematically optimal or perform flawlessly. They are provided entirely as-is. If they pass the vibes test for you, enjoy!\r\n\r\n## 🙏 Credits & Acknowledgments\r\n- **Base model:** [wangzhang/Qwen3.5-122B-A10B-abliterix](https://huggingface.co/wangzhang/Qwen3.5-122B-A10B-abliterix)\r\n- **imatrix source:** The imatrix was sourced from [mradermacher/Qwen3.5-122B-A10B-abliterix-i1-GGUF](https://huggingface.co/mradermacher/Qwen3.5-122B-A10B-abliterix-i1-GGUF) and converted from GGUF to legacy `.dat` format for ik_llama.cpp compatibility.\r\n- **Quantization recipes:** Heavily based on the blending logic from [ubergarm/Qwen3.5-122B-A10B-GGUF](https://huggingface.co/ubergarm/Qwen3.5-122B-A10B-GGUF).\r\n\r\n---\r\n\r\n## 🛠️ Quantization Recipes\r\n\r\nAll variants share the same structure: high precision on attention/gating layers and shared experts, with the routed expert layers (the bulk of model size) quantized to varying levels.\r\n\r\n### IQ4_KS\r\nBalances upgraded routed experts with compressed embeddings to save VRAM.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `IQ6_K` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ4_KS` |\r\n\r\n### IQ4_K\r\nSpends a bit more VRAM for full `Q8_0` precision on the vocabulary, with slightly heavier experts.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ4_K` |\r\n\r\n### IQ4_KSS\r\nUses split quant levels on routed experts (down vs gate/up) with compressed embeddings.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `IQ6_K` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts (down) | `IQ4_KS` |\r\n| Routed Experts (gate/up) | `IQ4_KSS` |\r\n\r\n### IQ5_KS\r\nSteps up to 5-bit routed experts with full-precision SSM alpha/beta weights.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ5_KS` |\r\n\r\n### IQ5_K\r\nSame structure as IQ5_KS but using IQ5_K for the routed experts.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ5_K` |\r\n\r\n### IQ6_K\r\nHighest quality routed expert quantization with full-precision SSM alpha/beta.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ6_K` |\r\n\r\n### IQ2_KL\r\nMaximum compression variant. Drops attention layers to `IQ6_K` and uses aggressive 2-3 bit routed expert quantization.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings | `IQ4_K` |\r\n| Output | `IQ6_K` |\r\n| Attention / Delta Net | `IQ6_K` |\r\n| SSM Alpha & Beta | `IQ6_K` |\r\n| Shared Experts | `IQ6_K` |\r\n| Routed Experts (down) | `IQ3_KS` |\r\n| Routed Experts (gate/up) | `IQ2_KL` |\r\n\r\n---\r\n\r\n## 💻 How to Run\r\n\r\n1. Clone and build the `ik_llama.cpp` fork from [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp).\r\n2. Use the compiled `llama-server` or `llama-cli` from that specific build.\r\n\r\n**Example `llama-server` launch command:**\r\n```bash\r\n./llama-server -m Qwen3.5-122B-A10B-abliterix-IQ4_KS.gguf -c 8192 -ngl 99 -fa\r\n```",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "quantization",
    "iq4_ks",
    "iq4_k",
    "iq4_kss",
    "iq5_k",
    "iq5_ks",
    "iq6_k",
    "iq2_kl",
    "ik_llama.cpp",
    "qwen",
    "qwen3_5_moe",
    "abliterated",
    "text-generation",
    "base_model:wangzhang/Qwen3.5-122B-A10B-abliterix",
    "base_model:quantized:wangzhang/Qwen3.5-122B-A10B-abliterix",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 1,
  "downloads": 2710,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-14T10:31:24.000Z",
  "created_at": "2026-04-14T07:01:35.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69dde64fab604b7ade271f5f",
  "id": "KeinNiemand/Qwen3.5-122B-A10B-abliterix-IK_GGUF",
  "modelId": "KeinNiemand/Qwen3.5-122B-A10B-abliterix-IK_GGUF",
  "sha": "ac07c2738cd9d56440df329f46599f58a6450c33",
  "createdAt": "2026-04-14T07:01:35.000Z",
  "lastModified": "2026-04-14T10:31:24.000Z",
  "author": "KeinNiemand",
  "downloads": 2710,
  "likes": 1,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 9
}

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard