GraySoft
Projects Models About FAQ Contact Download guIDE โ†’

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf IQ5_KS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf overview

๐Ÿšจ CRITICAL COMPATIBILITY WARNING ๐Ÿšจ These are iqk format quantizations and are EXCLUSIVE to the ikllama.cpp fork. They will NOT work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using ikawrakow's llama.cpp fork (or a UI where you have manually swapped the backend to an ikllama build). --- This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for wangzhang/Qwen3.5-122B-A10B-abliterix, an abliterated version of Qwen/Qwen3.5-122B-A10B. These quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.

ggufquantizationiq4_ksiq4_kiq4_kssiq5_kiq5_ksiq6_kiq2_klik_llama.cppqwenqwen3_5_moeabliteratedtext-generationbase_model:wangzhang/Qwen3.5-122B-A10B-abliterixbase_model:quantized:wangzhang/Qwen3.5-122B-A10B-abliterixendpoints_compatibleregion:usimatrixconversational
keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf visual
Downloads
2,710
Likes
1
Pipeline
text-generation
Library
โ€”
Visibility
Public
Access
Open

Repository Files & Downloads

7 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen3.5-122B-A10B-abliterix-IQ2_KL.gguf GGUF IQ2_KL 43.33 GB Download
Qwen3.5-122B-A10B-abliterix-IQ4_K.gguf GGUF IQ4_K 66.95 GB Download
Qwen3.5-122B-A10B-abliterix-IQ4_KS.gguf GGUF IQ4_KS 63.48 GB Download
Qwen3.5-122B-A10B-abliterix-IQ4_KSS.gguf GGUF IQ4_KSS 61.23 GB Download
Qwen3.5-122B-A10B-abliterix-IQ5_K.gguf GGUF IQ5_K 80.49 GB Download
Qwen3.5-122B-A10B-abliterix-IQ5_KS.gguf GGUF IQ5_KS 77.35 GB Download
Qwen3.5-122B-A10B-abliterix-IQ6_K.gguf GGUF IQ6_K 95.68 GB Download

Model Details Live

Model Slug
keinniemand/qwen3.5-122b-a10b-abliterix-ik_gguf
Author
KeinNiemand
Pipeline Task
text-generation
Library
โ€”
Created
2026-04-14
Last Modified
2026-04-14
Gated
No
Private
No
HF SHA
ac07c2738cd9d56440df329f46599f58a6450c33
License
Unknown
Language
Unknown
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "base_model": "wangzhang/Qwen3.5-122B-A10B-abliterix",
    "tags": [
      "gguf",
      "quantization",
      "iq4_ks",
      "iq4_k",
      "iq4_kss",
      "iq5_k",
      "iq5_ks",
      "iq6_k",
      "iq2_kl",
      "ik_llama.cpp",
      "qwen",
      "qwen3_5_moe",
      "abliterated"
    ],
    "pipeline_tag": "text-generation",
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "๐Ÿšจ **CRITICAL COMPATIBILITY WARNING** ๐Ÿšจ **These are iqk format quantizations and are EXCLUSIVE to the ik_llama.cpp fork.** They will **NOT** work on mainline llama.cpp, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using ikawrakow's llama.cpp fork (or a UI where you have manually swapped the backend to an ik_llama build). --- This repository contains custom, mixed-precision ik_llama.cpp GGUF quantizations for wangzhang/Qwen3.5-122B-A10B-abliterix, an abliterated version of Qwen/Qwen3.5-122B-A10B. These quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\r\nbase_model: wangzhang/Qwen3.5-122B-A10B-abliterix\r\ntags:\r\n- gguf\r\n- quantization\r\n- iq4_ks\r\n- iq4_k\r\n- iq4_kss\r\n- iq5_k\r\n- iq5_ks\r\n- iq6_k\r\n- iq2_kl\r\n- ik_llama.cpp\r\n- qwen\r\n- qwen3_5_moe\r\n- abliterated\r\npipeline_tag: text-generation\r\n---\r\n\r\n# Qwen3.5 122B A10B Abliterix - Custom GGUF Quantizations\r\n\r\n๐Ÿšจ **CRITICAL COMPATIBILITY WARNING** ๐Ÿšจ\r\n**These are `iqk` format quantizations and are EXCLUSIVE to the `ik_llama.cpp` fork.** They will **NOT** work on mainline `llama.cpp`, standard LM Studio, standard Text Generation WebUI, or KoboldCPP. You *must* compile and run this using [ikawrakow's llama.cpp fork](https://github.com/ikawrakow/ik_llama.cpp) (or a UI where you have manually swapped the backend to an `ik_llama` build).\r\n\r\n---\r\n\r\nThis repository contains custom, mixed-precision `ik_llama.cpp` GGUF quantizations for [wangzhang/Qwen3.5-122B-A10B-abliterix](https://huggingface.co/wangzhang/Qwen3.5-122B-A10B-abliterix), an abliterated version of [Qwen/Qwen3.5-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-122B-A10B).\r\n\r\nThese quants use different precision levels for different layer types, keeping attention and shared expert layers at high precision while compressing the routed experts (which make up the bulk of the model's size) to various IQK quantization levels.\r\n\r\n## โš ๏ธ Disclaimer: The \"Vibes Test\"\r\n**These quantizations have NOT been formally tested for perplexity.** They were compiled as an experiment to see how the model handles shifting bottlenecks. There is no guarantee that they are mathematically optimal or perform flawlessly. They are provided entirely as-is. If they pass the vibes test for you, enjoy!\r\n\r\n## ๐Ÿ™ Credits & Acknowledgments\r\n- **Base model:** [wangzhang/Qwen3.5-122B-A10B-abliterix](https://huggingface.co/wangzhang/Qwen3.5-122B-A10B-abliterix)\r\n- **imatrix source:** The imatrix was sourced from [mradermacher/Qwen3.5-122B-A10B-abliterix-i1-GGUF](https://huggingface.co/mradermacher/Qwen3.5-122B-A10B-abliterix-i1-GGUF) and converted from GGUF to legacy `.dat` format for ik_llama.cpp compatibility.\r\n- **Quantization recipes:** Heavily based on the blending logic from [ubergarm/Qwen3.5-122B-A10B-GGUF](https://huggingface.co/ubergarm/Qwen3.5-122B-A10B-GGUF).\r\n\r\n---\r\n\r\n## ๐Ÿ› ๏ธ Quantization Recipes\r\n\r\nAll variants share the same structure: high precision on attention/gating layers and shared experts, with the routed expert layers (the bulk of model size) quantized to varying levels.\r\n\r\n### IQ4_KS\r\nBalances upgraded routed experts with compressed embeddings to save VRAM.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `IQ6_K` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ4_KS` |\r\n\r\n### IQ4_K\r\nSpends a bit more VRAM for full `Q8_0` precision on the vocabulary, with slightly heavier experts.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ4_K` |\r\n\r\n### IQ4_KSS\r\nUses split quant levels on routed experts (down vs gate/up) with compressed embeddings.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `IQ6_K` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `Q8_0` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts (down) | `IQ4_KS` |\r\n| Routed Experts (gate/up) | `IQ4_KSS` |\r\n\r\n### IQ5_KS\r\nSteps up to 5-bit routed experts with full-precision SSM alpha/beta weights.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ5_KS` |\r\n\r\n### IQ5_K\r\nSame structure as IQ5_KS but using IQ5_K for the routed experts.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ5_K` |\r\n\r\n### IQ6_K\r\nHighest quality routed expert quantization with full-precision SSM alpha/beta.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings & Output | `Q8_0` |\r\n| Attention / Delta Net | `Q8_0` |\r\n| SSM Alpha & Beta | `F32` |\r\n| Shared Experts | `Q8_0` |\r\n| Routed Experts | `IQ6_K` |\r\n\r\n### IQ2_KL\r\nMaximum compression variant. Drops attention layers to `IQ6_K` and uses aggressive 2-3 bit routed expert quantization.\r\n| Layer Group | Quant |\r\n|---|---|\r\n| Token Embeddings | `IQ4_K` |\r\n| Output | `IQ6_K` |\r\n| Attention / Delta Net | `IQ6_K` |\r\n| SSM Alpha & Beta | `IQ6_K` |\r\n| Shared Experts | `IQ6_K` |\r\n| Routed Experts (down) | `IQ3_KS` |\r\n| Routed Experts (gate/up) | `IQ2_KL` |\r\n\r\n---\r\n\r\n## ๐Ÿ’ป How to Run\r\n\r\n1. Clone and build the `ik_llama.cpp` fork from [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp).\r\n2. Use the compiled `llama-server` or `llama-cli` from that specific build.\r\n\r\n**Example `llama-server` launch command:**\r\n```bash\r\n./llama-server -m Qwen3.5-122B-A10B-abliterix-IQ4_KS.gguf -c 8192 -ngl 99 -fa\r\n```",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "quantization",
    "iq4_ks",
    "iq4_k",
    "iq4_kss",
    "iq5_k",
    "iq5_ks",
    "iq6_k",
    "iq2_kl",
    "ik_llama.cpp",
    "qwen",
    "qwen3_5_moe",
    "abliterated",
    "text-generation",
    "base_model:wangzhang/Qwen3.5-122B-A10B-abliterix",
    "base_model:quantized:wangzhang/Qwen3.5-122B-A10B-abliterix",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 1,
  "downloads": 2710,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-14T10:31:24.000Z",
  "created_at": "2026-04-14T07:01:35.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69dde64fab604b7ade271f5f",
  "id": "KeinNiemand/Qwen3.5-122B-A10B-abliterix-IK_GGUF",
  "modelId": "KeinNiemand/Qwen3.5-122B-A10B-abliterix-IK_GGUF",
  "sha": "ac07c2738cd9d56440df329f46599f58a6450c33",
  "createdAt": "2026-04-14T07:01:35.000Z",
  "lastModified": "2026-04-14T10:31:24.000Z",
  "author": "KeinNiemand",
  "downloads": 2710,
  "likes": 1,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 9
}