muxodious/gpt-oss-20b-richarderkhov-heresy-gguf overview
Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. --- # Inference examples
Repository Files & Downloads
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"pipeline_tag": "text-generation",
"library_name": "transformers",
"tags": [
"vllm",
"heretic",
"uncensored",
"decensored",
"abliterated"
],
"base_model": [
"MuXodious/gpt-oss-20b-RichardErkhov-heresy"
],
"frontmatter": {
"license": "apache-2.0",
"pipeline_tag": "text-generation",
"library_name": "transformers",
"tags": [
"vllm",
"heretic",
"uncensored",
"decensored",
"abliterated"
],
"base_model": [
"MuXodious/gpt-oss-20b-RichardErkhov-heresy"
]
},
"hero_image_url": "https://img.shields.io/badge/HERESY_INDEX-RICHARDERKHOV-white?style=flat-square&labelColor=101010",
"summary": "* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. * **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. * **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. * **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. * **Agentic capabilities:** Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. * **MXFP4 quantization:** The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. --- # Inference examples",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: apache-2.0\npipeline_tag: text-generation\nlibrary_name: transformers\ntags:\n- vllm\n- heretic\n- uncensored\n- decensored\n- abliterated\nbase_model:\n- MuXodious/gpt-oss-20b-RichardErkhov-heresy\n---\nStatic GGUF quants of **gpt-oss-20b-RichardErkhov-heresy**.\n\n\n---\nThis is a **gpt-oss-20b** fine-tune, produced through an extremely scuffed modification on P-E-W's [Heretic](https://github.com/p-e-w/heretic) (v1.1.0) abliteration engine with [Magnitude-Preserving Orthogonal Ablation](https://github.com/p-e-w/heretic/pull/52) enabled.\n\nAs of March 16, 2026, the gpt-oss-20b-RichardErkhov-heresy, still tops the UGI leaderboard for models below 24B, with a willingness score of 10. \"And so, without a sword, David defeated and killed Goliath with a sling and a stone!\"\n\n**Note:** Hereby, I present you the final outcome of a vigorous heretication experiment on one of the toughest to crack LLM models. Cognomened after the Lord of Quantisation and, now, a Primarch of the Heretics, [RichardErkhov](https://huggingface.co/RichardErkhov), the model is poised to be the most decensored ablation of all other versions, while suffering the least damage. It is all yours to scrutinise and enjoy. [Jailbreak Chat Template](https://huggingface.co/MuXodious/gpt-oss-20b-RichardErkhov-heresy/blob/main/jailbreak_chat_template.jinja) (courtesy of [MagicalAlchemist](https://huggingface.co/MagicalAlchemist))\n\n---\n<img src=\"https://img.shields.io/badge/HERESY_INDEX-RICHARDERKHOV-white?style=flat-square&labelColor=101010\" align=\"right\" width=\"250\">\n\n\n**Heretication Results**\n\n| Score Metric | Value | Parameter | Value |\n| :--- | :--- | :--- | :--- |\n| **Refusals** | 6/100 | **direction_index** | 12.12 |\n| **KL Divergence** | 0.0802 | **attn.o_proj.max_weight** | 4.01 |\n| **Initial Refusals** | 98/100 | **attn.o_proj.max_weight_position** | 14.79 |\n||| **attn.o_proj.min_weight** | 1.44 |\n||| **attn.o_proj.min_weight_distance** | 8.37 |\n||| **mlp.down_proj.max_weight** | 5.48 |\n||| **mlp.down_proj.max_weight_position** | 18.97 |\n||| **mlp.down_proj.min_weight** | 2.87 |\n||| **mlp.down_proj.min_weight_distance** | 12.52 |\n\n\n---\n## Degree of Heretication\nThe **Heresy Index** weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.\n\n| Index Entry | Classification | Analysis |\n| :--- | :--- | :--- |\n|  | **Absolute Heresy** | Less than 10/100 Refusals and 0.10 KL Divergence |\n|  | **Tainted Heresy** | Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence |\n|  | **Impotent Heresy** | Anything above 25/100 Refusals and 0.21 KL Divergence |\n\n**Note**: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.\n\n---\n<p align=\"center\">\n <img alt=\"gpt-oss-20b\" src=\"https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg\">\n</p>\n\n<p align=\"center\">\n <a href=\"https://gpt-oss.com\"><strong>Try gpt-oss</strong></a> ·\n <a href=\"https://cookbook.openai.com/topic/gpt-oss\"><strong>Guides</strong></a> ·\n <a href=\"https://arxiv.org/abs/2508.10925\"><strong>Model card</strong></a> ·\n <a href=\"https://openai.com/index/introducing-gpt-oss/\"><strong>OpenAI blog</strong></a>\n</p>\n\n<br>\n\nWelcome to the gpt-oss series, [OpenAI’s open-weight models](https://openai.com/open-models) designed for powerful reasoning, agentic tasks, and versatile developer use cases.\n\nWe’re releasing two flavors of these open models:\n- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)\n- `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)\n\nBoth models were trained on our [harmony response format](https://github.com/openai/harmony) and should only be used with the harmony format as it will not work correctly otherwise.\n\n\n> [!NOTE]\n> This model card is dedicated to the smaller `gpt-oss-20b` model. Check out [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) for the larger model.\n\n# Highlights\n\n* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. \n* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. \n* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. \n* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.\n* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.\n* **MXFP4 quantization:** The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.\n\n---\n\n# Inference examples\n\n## Transformers\n\nYou can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.\n\nTo get started, install the necessary dependencies to setup your environment:\n\n```\npip install -U transformers kernels torch \n```\n\nOnce, setup you can proceed to run the model by running the snippet below:\n\n```py\nfrom transformers import pipeline\nimport torch\n\nmodel_id = \"openai/gpt-oss-20b\"\n\npipe = pipeline(\n \"text-generation\",\n model=model_id,\n torch_dtype=\"auto\",\n device_map=\"auto\",\n)\n\nmessages = [\n {\"role\": \"user\", \"content\": \"Explain quantum mechanics clearly and concisely.\"},\n]\n\noutputs = pipe(\n messages,\n max_new_tokens=256,\n)\nprint(outputs[0][\"generated_text\"][-1])\n```\n\nAlternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:\n\n```\ntransformers serve\ntransformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b\n```\n\n[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)\n\n## vLLM\n\nvLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.\n\n```bash\nuv pip install --pre vllm==0.10.1+gptoss \\\n --extra-index-url https://wheels.vllm.ai/gpt-oss/ \\\n --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \\\n --index-strategy unsafe-best-match\n\nvllm serve openai/gpt-oss-20b\n```\n\n[Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)\n\n## PyTorch / Triton\n\nTo learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).\n\n## Ollama\n\nIf you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).\n\n```bash\n# gpt-oss-20b\nollama pull gpt-oss:20b\nollama run gpt-oss:20b\n```\n\n[Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)\n\n#### LM Studio\n\nIf you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.\n\n```bash\n# gpt-oss-20b\nlms get openai/gpt-oss-20b\n```\n\nCheck out our [awesome list](https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md) for a broader collection of gpt-oss resources and inference partners.\n\n---\n\n# Download the model\n\nYou can download the model weights from the [Hugging Face Hub](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) directly from Hugging Face CLI:\n\n```shell\n# gpt-oss-20b\nhuggingface-cli download openai/gpt-oss-20b --include \"original/*\" --local-dir gpt-oss-20b/\npip install gpt-oss\npython -m gpt_oss.chat model/\n```\n\n# Reasoning levels\n\nYou can adjust the reasoning level that suits your task across three levels:\n\n* **Low:** Fast responses for general dialogue. \n* **Medium:** Balanced speed and detail. \n* **High:** Deep and detailed analysis.\n\nThe reasoning level can be set in the system prompts, e.g., \"Reasoning: high\".\n\n# Tool use\n\nThe gpt-oss models are excellent for:\n* Web browsing (using built-in browsing tools)\n* Function calling with defined schemas\n* Agentic operations like browser tasks\n\n# Fine-tuning\n\nBoth gpt-oss models can be fine-tuned for a variety of specialized use cases.\n\nThis smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) can be fine-tuned on a single H100 node.\n\n# Citation\n\n```bibtex\n@misc{openai2025gptoss120bgptoss20bmodel,\n title={gpt-oss-120b & gpt-oss-20b Model Card}, \n author={OpenAI},\n year={2025},\n eprint={2508.10925},\n archivePrefix={arXiv},\n primaryClass={cs.CL},\n url={https://arxiv.org/abs/2508.10925}, \n}\n```",
"related_quantizations": []
},
"tags": [
"transformers",
"gguf",
"vllm",
"heretic",
"uncensored",
"decensored",
"abliterated",
"text-generation",
"arxiv:2508.10925",
"base_model:MuXodious/gpt-oss-20b-RichardErkhov-heresy",
"base_model:quantized:MuXodious/gpt-oss-20b-RichardErkhov-heresy",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 3,
"downloads": 823,
"gated": false,
"private": false,
"last_modified": "2026-03-16T09:30:31.000Z",
"created_at": "2026-02-07T15:44:00.000Z",
"pipeline_tag": "text-generation",
"library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69875dc024c75e432032c721",
"id": "MuXodious/gpt-oss-20b-RichardErkhov-heresy-GGUF",
"modelId": "MuXodious/gpt-oss-20b-RichardErkhov-heresy-GGUF",
"sha": "3af4ba09663ef2c3ffcabcbf3bfb78f32435f592",
"createdAt": "2026-02-07T15:44:00.000Z",
"lastModified": "2026-03-16T09:30:31.000Z",
"author": "MuXodious",
"downloads": 823,
"likes": 3,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "transformers",
"siblings_count": 5
}