unidaikon/qwen3.5-35b-a3b-q5_k_xxl-gguf overview
Overview This repository provides a custom quantization of Qwen3.5-35B-A3B to Q5K format, with a hybrid precision approach that keeps ssm and attention layers in high precision to preserve long-context performance. The resulting model size is approximately 23.8 GiB, optimized for systems with 32 GiB RAM + 8 GiB VRAM. The Vision tower (mmproj) is the same file as the one in any other quantization repos. ### Background Currently, all gguf quantizations of Qwen3.5 compress ssm layers to low precision. For example, in Qwen3.5-35B-A3B-UD-Q5KXS: This may cause issues in long-context scenarios: ssm layers perform linear accumulation during generation, causing quantization errors to compound over time ssm layers are small (2048×32 and 4096×2048), so quantization provides minimal performance gain For certain tokens requiring minor knowledge updates, ssm_beta quantization may introduce noticeable degradation This Quant: Keep ssm` layers in BF16 precision. ### Other Modificatoin Higher precision to token embeddings (token_embd.weight) Qwen3.5 has much larger token list and so high precision can prevents token representation collapse Higher attention matrix precision Full attention is critical in the SSM–FULL attention fusion architecture As only 25% layers have full attention, it's safe to have more bits without slow down inference
Repository Files & Downloads
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"base_model": [
"Qwen/Qwen3.5-35B-A3B",
"unsloth/Qwen3.5-35B-A3B-GGUF"
],
"tags": [
"Qwen3.5-35B-A3B",
"GGUF"
],
"frontmatter": {
"base_model": [
"Qwen/Qwen3.5-35B-A3B",
"unsloth/Qwen3.5-35B-A3B-GGUF"
],
"tags": [
"Qwen3.5-35B-A3B",
"GGUF"
]
},
"hero_image_url": "",
"summary": "### Overview This repository provides a custom quantization of **Qwen3.5-35B-A3B** to **Q5_K** format, with a hybrid precision approach that keeps ssm and attention layers in high precision to preserve long-context performance. The resulting model size is approximately 23.8 GiB, optimized for systems with 32 GiB RAM + 8 GiB VRAM. The Vision tower (mmproj) is the same file as the one in any other quantization repos. ### Background Currently, all gguf quantizations of Qwen3.5 compress ssm layers to low precision. For example, in Qwen3.5-35B-A3B-UD-Q5_K_XS: `` blk.0.attn_qkv.weight \t[2,048, 8,192] \tQ5_K ... blk.0.ffn_gate_inp_shexp.weight \t[2,048] \tF32 blk.0.ffn_gate_shexp.weight \t[2,048, 512] \tQ8_0 blk.0.ffn_up_exps.weight \t[2,048, 512, 256] \tQ5_K ... blk.0.ssm_alpha.weight \t[2,048, 32] \tQ5_K blk.0.ssm_beta.weight \t[2,048, 32] \tQ5_K ... blk.0.ssm_out.weight \t[4,096, 2,048] \tQ5_K ` This may cause issues in long-context scenarios: * ssm layers perform linear accumulation during generation, causing quantization errors to compound over time * ssm layers are small (2048×32 and 4096×2048), so quantization provides minimal performance gain * For certain tokens requiring minor knowledge updates, ssm_beta quantization may introduce noticeable degradation This Quant: Keep ssm` layers in **BF16** precision. ### Other Modificatoin * Higher precision to token embeddings (token_embd.weight) * Qwen3.5 has much larger token list and so high precision can prevents token representation collapse * Higher attention matrix precision * Full attention is critical in the SSM–FULL attention fusion architecture * As only 25% layers have full attention, it's safe to have more bits without slow down inference",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nbase_model:\n- Qwen/Qwen3.5-35B-A3B\n- unsloth/Qwen3.5-35B-A3B-GGUF\ntags:\n- Qwen3.5-35B-A3B\n- GGUF\n---\n\n\n### Overview\n\nThis repository provides a custom quantization of **Qwen3.5-35B-A3B** to **Q5_K** format, \nwith a hybrid precision approach that keeps `ssm` and `attention` layers in high precision to preserve long-context performance. \nThe resulting model size is approximately 23.8 GiB, optimized for systems with 32 GiB RAM + 8 GiB VRAM.\n\nThe Vision tower (mmproj) is the same file as the one in any other quantization repos.\n\n### Background\n\nCurrently, all gguf quantizations of `Qwen3.5` compress `ssm` layers to low precision. \nFor example, in `Qwen3.5-35B-A3B-UD-Q5_K_XS`:\n\n```\nblk.0.attn_qkv.weight \t[2,048, 8,192] \tQ5_K\n...\nblk.0.ffn_gate_inp_shexp.weight \t[2,048] \tF32\nblk.0.ffn_gate_shexp.weight \t[2,048, 512] \tQ8_0\nblk.0.ffn_up_exps.weight \t[2,048, 512, 256] \tQ5_K\n...\nblk.0.ssm_alpha.weight \t[2,048, 32] \tQ5_K\nblk.0.ssm_beta.weight \t[2,048, 32] \tQ5_K\n...\nblk.0.ssm_out.weight \t[4,096, 2,048] \tQ5_K\n```\n\nThis may cause issues in long-context scenarios:\n\n* `ssm` layers perform linear accumulation during generation, causing quantization errors to compound over time\n* `ssm` layers are small (`2048×32` and `4096×2048`), so quantization provides minimal performance gain\n* For certain tokens requiring minor knowledge updates, `ssm_beta` quantization may introduce noticeable degradation\n\nThis Quant: Keep `ssm` layers in **BF16** precision.\n\n### Other Modificatoin\n\n* Higher precision to token embeddings (token_embd.weight)\n * Qwen3.5 has much larger token list and so high precision can prevents token representation collapse \n* Higher attention matrix precision\n * Full attention is critical in the SSM–FULL attention fusion architecture\n * As only 25% layers have full attention, it's safe to have more bits without slow down inference",
"related_quantizations": []
},
"tags": [
"gguf",
"Qwen3.5-35B-A3B",
"GGUF",
"base_model:Qwen/Qwen3.5-35B-A3B",
"base_model:quantized:Qwen/Qwen3.5-35B-A3B",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 0,
"downloads": 141,
"gated": false,
"private": false,
"last_modified": "2026-02-26T08:03:19.000Z",
"created_at": "2026-02-26T07:19:25.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "699ff3fd8c00ecb963d03c2d",
"id": "unidaikon/Qwen3.5-35B-A3B-Q5_K_XXL-GGUF",
"modelId": "unidaikon/Qwen3.5-35B-A3B-Q5_K_XXL-GGUF",
"sha": "7e646f72ad7014c0a8f82b95fb20cd718ad94893",
"createdAt": "2026-02-26T07:19:25.000Z",
"lastModified": "2026-02-26T08:03:19.000Z",
"author": "unidaikon",
"downloads": 141,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 4
}