juanml82/huihui-qwen3-next-80b-a3b-thinking-abliterated-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
juanml82/huihui-qwen3-next-80b-a3b-thinking-abliterated-gguf overview
GGUF quants for Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants: iQ3M – should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant. MXFP4MOE – a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpumoe, in systems with 64gb of ram Q6K – will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8 I didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it. The q4m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4moe Enjoy! --- license: apache-2.0 language: basemodel: pipeline_tag: text-generation tags: ---
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-q4_K_M.gguf | GGUF | Q4_K_M | 45.09 GB | Download |
| qwen3-next-80b-a3b-thinking-IQ3_M.gguf | GGUF | IQ3_M | 32.66 GB | Download |
| qwen3-next-80b-a3b-thinking-mxfp4_moe.gguf | GGUF | — | 40.74 GB | Download |
| qwen3-next-80b-a3b-thinking-q6_k.gguf | GGUF | Q6_K | 61.03 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"language": [
"en",
"zh"
],
"base_model": [
"huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated"
],
"frontmatter": {
"license": "apache-2.0",
"language": [
"en",
"zh"
],
"base_model": [
"huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated"
]
},
"hero_image_url": "",
"summary": "GGUF quants for Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated I've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model. I've uploaded three quants: iQ3_M – should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant. MXFP4_MOE – a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpu_moe, in systems with 64gb of ram Q6K – will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8 I didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it. The q4_m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4_moe Enjoy! --- license: apache-2.0 language: base_model: pipeline_tag: text-generation tags: ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: apache-2.0\nlanguage:\n- en\n- zh\nbase_model:\n- huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated\n---\nGGUF quants for [Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated)\n\nI've recreated them after the late December 2025 llama.cpp update which speeds up Qwen 3 Next, so these quants should perform better than the early quants for this model.\nI've uploaded three quants:\n\niQ3_M – should fit (tight) in systems with 32gb of ram plus an 8-12gb gpu with ram offloading. Possibly lowest useful quant.\n\nMXFP4_MOE – a tight fit for systems with 32gb of ram plus a 16gb or more gpu. Or to fully load it in system ram, with cpu_moe, in systems with 64gb of ram\n\nQ6K – will work well with systems with 64gb of ram plus ram offloading. Quality is supposed to very almost indistinguishable from Q8\n\nI didn't do a Q8. it could be a tight fit in systems with 64gb of ram and a 24gb vram gpu, but I have that system and it's freezing when I try to load it.\n\nThe q4_m file is older and slower than these new three quants, so I see no reason to use it instad of the mxfp4_moe\n\nEnjoy!\n\n\n---\nlicense: apache-2.0\nlanguage:\n- en\n- zh\nbase_model:\n- huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated\npipeline_tag: text-generation\ntags:\n- abliterated\n- uncensored\n---",
"related_quantizations": []
},
"tags": [
"gguf",
"en",
"zh",
"base_model:huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated",
"base_model:quantized:huihui-ai/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 3,
"downloads": 665,
"gated": false,
"private": false,
"last_modified": "2026-01-16T23:16:05.000Z",
"created_at": "2025-12-12T22:46:52.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "693c9b5c04c1c3de3f714e52",
"id": "juanml82/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-gguf",
"modelId": "juanml82/Huihui-Qwen3-Next-80B-A3B-Thinking-abliterated-gguf",
"sha": "7bed9b6d5b42ea74b47ad87c3eb2356a0b001416",
"createdAt": "2025-12-12T22:46:52.000Z",
"lastModified": "2026-01-16T23:16:05.000Z",
"author": "juanml82",
"downloads": 665,
"likes": 3,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 6
}