GraySoft
Projects Models About FAQ Contact Download guIDE →
Model Intelligence Sheet

muxodious/gpt-oss-20b-richarderkhov-heresy-gguf overview

Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. --- # Inference examples

transformersggufvllmhereticuncensoreddecensoredabliteratedtext-generationarxiv:2508.10925base_model:MuXodious/gpt-oss-20b-RichardErkhov-heresybase_model:quantized:MuXodious/gpt-oss-20b-RichardErkhov-heresylicense:apache-2.0endpoints_compatibleregion:usconversational
muxodious/gpt-oss-20b-richarderkhov-heresy-gguf visual
Downloads
823
Likes
3
Pipeline
text-generation
Library
transformers
Visibility
Public
Access
Open

Repository Files & Downloads

3 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
gpt-oss-20B-RichardErkhov-heresy-BF16.gguf GGUF BF16 12.85 GB Download
gpt-oss-20B-RichardErkhov-heresy-MXFP4_MoE.gguf GGUF 11.78 GB Download
gpt-oss-20B-RichardErkhov-heresy-iQ4_XS.gguf GGUF IQ4_XS 11.91 GB Download

Model Details Live

Model Slug
muxodious/gpt-oss-20b-richarderkhov-heresy-gguf
Author
MuXodious
Pipeline Task
text-generation
Library
transformers
Created
2026-02-07
Last Modified
2026-03-16
Gated
No
Private
No
HF SHA
3af4ba09663ef2c3ffcabcbf3bfb78f32435f592
License
apache-2.0
Language
Unknown
Base Model
MuXodious/gpt-oss-20b-RichardErkhov-heresy

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "pipeline_tag": "text-generation",
    "library_name": "transformers",
    "tags": [
      "vllm",
      "heretic",
      "uncensored",
      "decensored",
      "abliterated"
    ],
    "base_model": [
      "MuXodious/gpt-oss-20b-RichardErkhov-heresy"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "pipeline_tag": "text-generation",
      "library_name": "transformers",
      "tags": [
        "vllm",
        "heretic",
        "uncensored",
        "decensored",
        "abliterated"
      ],
      "base_model": [
        "MuXodious/gpt-oss-20b-RichardErkhov-heresy"
      ]
    },
    "hero_image_url": "https://img.shields.io/badge/HERESY_INDEX-RICHARDERKHOV-white?style=flat-square&labelColor=101010",
    "summary": "* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. * **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. * **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. * **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. * **Agentic capabilities:** Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. * **MXFP4 quantization:** The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. --- # Inference examples",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\npipeline_tag: text-generation\nlibrary_name: transformers\ntags:\n- vllm\n- heretic\n- uncensored\n- decensored\n- abliterated\nbase_model:\n- MuXodious/gpt-oss-20b-RichardErkhov-heresy\n---\nStatic GGUF quants of **gpt-oss-20b-RichardErkhov-heresy**.\n\n\n---\nThis is a **gpt-oss-20b** fine-tune, produced through an extremely scuffed modification on P-E-W's [Heretic](https://github.com/p-e-w/heretic) (v1.1.0) abliteration engine with [Magnitude-Preserving Orthogonal Ablation](https://github.com/p-e-w/heretic/pull/52) enabled.\n\nAs of March 16, 2026, the gpt-oss-20b-RichardErkhov-heresy, still tops the UGI leaderboard for models below 24B, with a willingness score of 10. \"And so, without a sword, David defeated and killed Goliath with a sling and a stone!\"\n\n**Note:** Hereby, I present you the final outcome of a vigorous heretication experiment on one of the toughest to crack LLM models. Cognomened after the Lord of Quantisation and, now, a Primarch of the Heretics, [RichardErkhov](https://huggingface.co/RichardErkhov), the model is poised to be the most decensored ablation of all other versions, while suffering the least damage. It is all yours to scrutinise and enjoy. [Jailbreak Chat Template](https://huggingface.co/MuXodious/gpt-oss-20b-RichardErkhov-heresy/blob/main/jailbreak_chat_template.jinja) (courtesy of [MagicalAlchemist](https://huggingface.co/MagicalAlchemist))\n\n---\n<img src=\"https://img.shields.io/badge/HERESY_INDEX-RICHARDERKHOV-white?style=flat-square&labelColor=101010\" align=\"right\" width=\"250\">\n\n\n**Heretication Results**\n\n| Score Metric | Value | Parameter | Value |\n| :--- | :--- | :--- | :--- |\n| **Refusals** | 6/100 | **direction_index** |  12.12 |\n| **KL Divergence** | 0.0802  | **attn.o_proj.max_weight** | 4.01 |\n| **Initial Refusals** | 98/100 | **attn.o_proj.max_weight_position** | 14.79 |\n||| **attn.o_proj.min_weight** | 1.44 |\n||| **attn.o_proj.min_weight_distance** | 8.37 |\n||| **mlp.down_proj.max_weight** | 5.48 |\n||| **mlp.down_proj.max_weight_position** | 18.97 |\n||| **mlp.down_proj.min_weight** | 2.87 |\n||| **mlp.down_proj.min_weight_distance** | 12.52 |\n\n\n---\n## Degree of Heretication\nThe **Heresy Index** weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.\n\n| Index Entry | Classification | Analysis |\n| :--- | :--- | :--- |\n| ![Absolute](https://img.shields.io/badge/HERESY_INDEX-ABSOLUTE-white?style=flat-square&labelColor=101010) | **Absolute Heresy** | Less than 10/100 Refusals and 0.10 KL Divergence |\n| ![Tainted](https://img.shields.io/badge/HERESY_INDEX-TAINTED-blueviolet?style=flat-square&labelColor=101010) | **Tainted Heresy** | Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence |\n| ![Impotent](https://img.shields.io/badge/HERESY_INDEX-IMPOTENT-5c4033?style=flat-square&labelColor=101010) | **Impotent Heresy** | Anything above 25/100 Refusals and 0.21 KL Divergence |\n\n**Note**: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.\n\n---\n<p align=\"center\">\n  <img alt=\"gpt-oss-20b\" src=\"https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg\">\n</p>\n\n<p align=\"center\">\n  <a href=\"https://gpt-oss.com\"><strong>Try gpt-oss</strong></a> ·\n  <a href=\"https://cookbook.openai.com/topic/gpt-oss\"><strong>Guides</strong></a> ·\n  <a href=\"https://arxiv.org/abs/2508.10925\"><strong>Model card</strong></a> ·\n  <a href=\"https://openai.com/index/introducing-gpt-oss/\"><strong>OpenAI blog</strong></a>\n</p>\n\n<br>\n\nWelcome to the gpt-oss series, [OpenAI’s open-weight models](https://openai.com/open-models) designed for powerful reasoning, agentic tasks, and versatile developer use cases.\n\nWe’re releasing two flavors of these open models:\n- `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)\n- `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)\n\nBoth models were trained on our [harmony response format](https://github.com/openai/harmony) and should only be used with the harmony format as it will not work correctly otherwise.\n\n\n> [!NOTE]\n> This model card is dedicated to the smaller `gpt-oss-20b` model. Check out [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) for the larger model.\n\n# Highlights\n\n* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.  \n* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.  \n* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.  \n* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.\n* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.\n* **MXFP4 quantization:** The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.\n\n---\n\n# Inference examples\n\n## Transformers\n\nYou can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package.\n\nTo get started, install the necessary dependencies to setup your environment:\n\n```\npip install -U transformers kernels torch \n```\n\nOnce, setup you can proceed to run the model by running the snippet below:\n\n```py\nfrom transformers import pipeline\nimport torch\n\nmodel_id = \"openai/gpt-oss-20b\"\n\npipe = pipeline(\n    \"text-generation\",\n    model=model_id,\n    torch_dtype=\"auto\",\n    device_map=\"auto\",\n)\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Explain quantum mechanics clearly and concisely.\"},\n]\n\noutputs = pipe(\n    messages,\n    max_new_tokens=256,\n)\nprint(outputs[0][\"generated_text\"][-1])\n```\n\nAlternatively, you can run the model via [`Transformers Serve`](https://huggingface.co/docs/transformers/main/serving) to spin up a OpenAI-compatible webserver:\n\n```\ntransformers serve\ntransformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b\n```\n\n[Learn more about how to use gpt-oss with Transformers.](https://cookbook.openai.com/articles/gpt-oss/run-transformers)\n\n## vLLM\n\nvLLM recommends using [uv](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.\n\n```bash\nuv pip install --pre vllm==0.10.1+gptoss \\\n    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \\\n    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \\\n    --index-strategy unsafe-best-match\n\nvllm serve openai/gpt-oss-20b\n```\n\n[Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)\n\n## PyTorch / Triton\n\nTo learn about how to use this model with PyTorch and Triton, check out our [reference implementations in the gpt-oss repository](https://github.com/openai/gpt-oss?tab=readme-ov-file#reference-pytorch-implementation).\n\n## Ollama\n\nIf you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after [installing Ollama](https://ollama.com/download).\n\n```bash\n# gpt-oss-20b\nollama pull gpt-oss:20b\nollama run gpt-oss:20b\n```\n\n[Learn more about how to use gpt-oss with Ollama.](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama)\n\n#### LM Studio\n\nIf you are using [LM Studio](https://lmstudio.ai/) you can use the following commands to download.\n\n```bash\n# gpt-oss-20b\nlms get openai/gpt-oss-20b\n```\n\nCheck out our [awesome list](https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md) for a broader collection of gpt-oss resources and inference partners.\n\n---\n\n# Download the model\n\nYou can download the model weights from the [Hugging Face Hub](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) directly from Hugging Face CLI:\n\n```shell\n# gpt-oss-20b\nhuggingface-cli download openai/gpt-oss-20b --include \"original/*\" --local-dir gpt-oss-20b/\npip install gpt-oss\npython -m gpt_oss.chat model/\n```\n\n# Reasoning levels\n\nYou can adjust the reasoning level that suits your task across three levels:\n\n* **Low:** Fast responses for general dialogue.  \n* **Medium:** Balanced speed and detail.  \n* **High:** Deep and detailed analysis.\n\nThe reasoning level can be set in the system prompts, e.g., \"Reasoning: high\".\n\n# Tool use\n\nThe gpt-oss models are excellent for:\n* Web browsing (using built-in browsing tools)\n* Function calling with defined schemas\n* Agentic operations like browser tasks\n\n# Fine-tuning\n\nBoth gpt-oss models can be fine-tuned for a variety of specialized use cases.\n\nThis smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) can be fine-tuned on a single H100 node.\n\n# Citation\n\n```bibtex\n@misc{openai2025gptoss120bgptoss20bmodel,\n      title={gpt-oss-120b & gpt-oss-20b Model Card}, \n      author={OpenAI},\n      year={2025},\n      eprint={2508.10925},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2508.10925}, \n}\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "vllm",
    "heretic",
    "uncensored",
    "decensored",
    "abliterated",
    "text-generation",
    "arxiv:2508.10925",
    "base_model:MuXodious/gpt-oss-20b-RichardErkhov-heresy",
    "base_model:quantized:MuXodious/gpt-oss-20b-RichardErkhov-heresy",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 3,
  "downloads": 823,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-16T09:30:31.000Z",
  "created_at": "2026-02-07T15:44:00.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69875dc024c75e432032c721",
  "id": "MuXodious/gpt-oss-20b-RichardErkhov-heresy-GGUF",
  "modelId": "MuXodious/gpt-oss-20b-RichardErkhov-heresy-GGUF",
  "sha": "3af4ba09663ef2c3ffcabcbf3bfb78f32435f592",
  "createdAt": "2026-02-07T15:44:00.000Z",
  "lastModified": "2026-03-16T09:30:31.000Z",
  "author": "MuXodious",
  "downloads": 823,
  "likes": 3,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 5
}