djlougen/ornstein-27b-saber-rys-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
djlougen/ornstein-27b-saber-rys-gguf overview
0% refusal. Zero perplexity degradation. Layer-duplicated reasoning boost. This model combines two complementary, training-free surgical techniques applied to DJLougen/Ornstein-27B: 1. SABER (Spectral Analysis-Based Entanglement Resolution) — removes safety refusal behavior while preserving capability 2. RYS (Repeat Your Self) — duplicates reasoning-circuit layers to improve reasoning and emotional intelligence Both techniques modify model structure without changing any weights — SABER through targeted direction ablation, RYS through layer duplication. ---
Repository Files & Downloads
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "other",
"license_name": "qwen3.5",
"license_link": "https://huggingface.co/Qwen/Qwen3.5-32B/blob/main/LICENSE",
"base_model": "DJLougen/Ornstein-27B",
"tags": [
"refusal-ablation",
"capability-preserving",
"saber",
"rys",
"layer-surgery",
"qwen3.5",
"multimodal",
"27b"
],
"pipeline_tag": "text-generation",
"frontmatter": {
"license": "other",
"license_name": "qwen3.5",
"license_link": "https://huggingface.co/Qwen/Qwen3.5-32B/blob/main/LICENSE",
"base_model": "DJLougen/Ornstein-27B",
"tags": [
"refusal-ablation",
"capability-preserving",
"saber",
"rys",
"layer-surgery",
"qwen3.5",
"multimodal",
"27b"
],
"pipeline_tag": "text-generation"
},
"hero_image_url": "Ornstein27BSABER.jpeg",
"summary": "> **0% refusal. Zero perplexity degradation. Layer-duplicated reasoning boost.** This model combines two complementary, training-free surgical techniques applied to DJLougen/Ornstein-27B: 1. **SABER** (Spectral Analysis-Based Entanglement Resolution) — removes safety refusal behavior while preserving capability 2. **RYS** (Repeat Your Self) — duplicates reasoning-circuit layers to improve reasoning and emotional intelligence Both techniques modify model structure without changing any weights — SABER through targeted direction ablation, RYS through layer duplication. ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: other\nlicense_name: qwen3.5\nlicense_link: https://huggingface.co/Qwen/Qwen3.5-32B/blob/main/LICENSE\nbase_model: DJLougen/Ornstein-27B\ntags:\n- refusal-ablation\n- capability-preserving\n- saber\n- rys\n- layer-surgery\n- qwen3.5\n- multimodal\n- 27b\npipeline_tag: text-generation\n---\n\n<img src=\"Ornstein27BSABER.jpeg\" alt=\"Ornstein-27B SABER\" width=\"100%\"/>\n\n# DJLougen/Ornstein-27B-SABER-RYS\n\n> **0% refusal. Zero perplexity degradation. Layer-duplicated reasoning boost.**\n\nThis model combines two complementary, training-free surgical techniques applied to [DJLougen/Ornstein-27B](https://huggingface.co/DJLougen/Ornstein-27B):\n\n1. **SABER** (Spectral Analysis-Based Entanglement Resolution) — removes safety refusal behavior while preserving capability\n2. **RYS** (Repeat Your Self) — duplicates reasoning-circuit layers to improve reasoning and emotional intelligence\n\nBoth techniques modify model structure without changing any weights — SABER through targeted direction ablation, RYS through layer duplication.\n\n---\n\n## SABER: Refusal Ablation\n\n### Key Results\n\n| Metric | Baseline | SABER-Refined | Delta |\n|--------|----------|---------------|-------|\n| Refusal Rate | 100% | **0%** | -100% |\n| Perplexity | 3.5 | **3.5** | +0.6% |\n| Directions Ablated | — | 125 (across 25 layers) | — |\n\nThe refusal circuit is cleanly separated from capability — removing it produces **zero measurable perplexity degradation**.\n\n### How SABER Works\n\n<img src=\"saber_pipeline.png\" alt=\"SABER Pipeline\" width=\"100%\"/>\n\nSABER identifies and ablates the refusal circuit through a five-stage pipeline:\n\n**Stage 1 — Probing**: Extract activation profiles from both harmful and harmless inputs across all transformer layers.\n\n**Stage 2 — Spectral Analysis**: Decompose activation differences into individual refusal directions, each scored by how strongly they separate harmful from harmless representations.\n\n**Stage 3 — Entanglement Quantification**: Measure the overlap between each refusal direction and the model's capability subspace (reasoning, knowledge, code, etc.) to avoid collateral damage.\n\n**Stage 4 — Targeted Ablation**: Remove only the pure-refusal components, with strength proportional to their purity (how little they overlap with capability).\n\n**Stage 5 — Iterative Refinement**: Re-probe after each ablation pass to catch hydra effects (dormant refusal features that activate when primary ones are removed).\n\n**Key differentiator from prior work**: SABER explicitly measures and respects the *entanglement* between refusal and capability representations. Directions that are heavily entangled with capability are either skipped or ablated at reduced strength.\n\n<img src=\"entanglement_scatter.png\" alt=\"Direction Purity vs Separability\" width=\"100%\"/>\n\n### Sweep Results\n\n<img src=\"sweep_comparison.png\" alt=\"SABER Sweep Comparison\" width=\"100%\"/>\n\nConfiguration search over `global_top_k` (number of top directions selected globally) and `alpha_base` (base ablation strength):\n\n| Top-K | Alpha | Refusal | PPL | PPL Delta | Layers | Dirs Ablated |\n|:-----:|:-----:|:-------:|:---:|:---------:|:------:|:------------:|\n| 25 | 0.85 | 5% | 3.5 | +0.4% | 25 | 125 |\n| **25** | **1.00** | **0%** | **3.5** | **+0.6%** | **25** | **125** |\n| 50 | 0.85 | 0% | 3.5 | +0.8% | 36 | 250 |\n| 50 | 1.00 | 0% | 3.5 | +0.7% | 36 | 250 |\n| 75 | 0.85 | 0% | 3.5 | +0.9% | 37 | 375 |\n| 75 | 1.00 | 0% | 3.5 | +0.9% | 37 | 375 |\n\n**Best config: `top_k=25, alpha=1.0`** — achieves 0% refusal with zero meaningful PPL change, using the minimum number of directions.\n\n<img src=\"refusal_comparison.png\" alt=\"Refusal Rate Comparison\" width=\"100%\"/>\n\n### Ablation Convergence (Best Config)\n\n<img src=\"ablation_convergence.png\" alt=\"Ablation Convergence\" width=\"100%\"/>\n\nCapability degradation remains at **0.00%** across all 5 iterations — the refusal directions are surgically removed with zero collateral damage.\n\n---\n\n## RYS: Reasoning Layer Duplication\n\n### Method\n\n**RYS (Repeat Your Self)** is a layer-duplication technique discovered by [David Noel Ng](https://dnhkng.github.io/posts/rys/) that duplicates contiguous blocks of middle transformer layers so they execute twice per forward pass. **No weights are modified** — the model simply traverses some layers a second time, giving it \"another pass\" through its core reasoning circuit.\n\nFor a model with N layers, a configuration **(i, j)** produces:\n- Layers 0 through j−1 run normally\n- Then layers i through j−1 are **re-executed** (looped back)\n- Remaining layers j through N−1 run normally\n- Layers i through j−1 execute **twice** per inference pass\n\nThis exploits the **functional neuroanatomy** of transformers:\n- **Early layers (0–5)**: Input encoding — duplication hurts\n- **Middle layers (~10–50)**: Reasoning circuits in format-agnostic space — **duplication helps**\n- **Late layers (~55–64)**: Output decoding — duplication degrades\n\n### Pareto-Optimal Configs for Qwen3.5-27B\n\nBased on the [full sweep of Qwen3.5-27B](https://dnhkng.github.io/posts/rys-ii/) — 4,643 measured configurations, XGBoost surrogate over 430K+ candidates, and final validation on Math120 + EQ140 — the Pareto frontier lies in layers 26–34 of the reasoning circuit.\n\n**Important for GGUF/llama.cpp**: Qwen3.5-27B is a hybrid Mamba/SSM + Attention architecture with a strict 4-layer repeating pattern (3 SSM + 1 ATTN). Layer duplication blocks **must be a multiple of 4 layers** to preserve this pattern, otherwise llama.cpp fails to load the model. The original Pareto configs from the blog (which used ExLlamaV3) have been adapted to the nearest valid 4-aligned configs:\n\n| Variant | Config | Duplicated Layers | Extra Layers | Overhead | Nearest Pareto Config |\n|:-------:|:------:|:-----------------:|:------------:|:--------:|:---------------------:|\n| **S** | (28,32) | 28–31 | +4 | +6.25% | ≈ (30,34) |\n| **M** | (31,35) | 31–34 | +4 | +6.25% | ≈ (31,34) |\n| **L** | (30,34) | 30–33 | +4 | +6.25% | ≈ (30,35) |\n| **XL** | (26,34) | 26–33 | +8 | +12.50% | = (26,34) ✓ |\n\n**Critical finding**: the (26,34) XL config is the only original Pareto point that is natively 4-aligned. The S/M/L variants use the nearest valid 4-layer blocks that cover the same reasoning region. The EQ delta barely moves across all sizes (+0.095 to +0.101), so even the smallest valid config delivers most of the benefit.\n\n### Reference: RYS Scores on Qwen3.5-27B\n\nProbe scores from [XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF](https://huggingface.co/XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF) (RYS-30-34 config, identical base architecture):\n\n| Probe | Base (64 layers) | RYS 30-33 (68 layers) | RYS 34-37 (68 layers) |\n|:-----:|:---------------:|:---------------------:|:---------------------:|\n| Math | 0.375 | **0.438** | 0.375 |\n| EQ | 11.5 | 29.5 | **39.4** |\n| Reasoning | 0.000 | **0.353** | 0.000 |\n| Logic | 0.00 | **1.00** | 0.00 |\n\n### Reference: BFCLv4 Function Calling (RYS vs Baseline vs Frontier Models)\n\nFrom the XpressAI RYS-30-34 evaluation on [BFCLv4](https://gorilla.cs.berkeley.edu/leaderboard.html):\n\n| Task | RYS-30-34 | Qwen3.5-27B Base | Δ |\n|:----:|:---------:|:----------------:|:-:|\n| parallel | **95.00%** | 93.00% | +2.00% |\n| parallel_multiple | **91.50%** | 76.00% | **+15.50%** |\n| simple_javascript | **72.00%** | 66.00% | +6.00% |\n| live_relevance | **81.25%** | 68.75% | **+12.50%** |\n| multi_turn_base | **74.50%** | 70.50% | +4.00% |\n| multi_turn_long_context | **67.50%** | 59.00% | +8.50% |\n\n7 of 13 benchmarks improved, with large gains on parallel function calling and live relevance.\n\n---\n\n## Available Variants\n\n| File | RYS Config | Layers | Size |\n|:----:|:----------:|:------:|:----:|\n| `Ornstein-27B-SABER-Q4_K_M.gguf` | — (SABER only) | 64 | 16.5 GB |\n| `Ornstein-27B-SABER-RYS-S-Q4_K_M.gguf` | (28,32) | 68 | ~17.5 GB |\n| `Ornstein-27B-SABER-RYS-M-Q4_K_M.gguf` | (31,35) | 68 | ~17.5 GB |\n| `Ornstein-27B-SABER-RYS-L-Q4_K_M.gguf` | (30,34) | 68 | ~17.5 GB |\n| `Ornstein-27B-SABER-RYS-XL-Q4_K_M.gguf` | (26,34) | 72 | ~18.6 GB |\n\n### Usage\n\n```bash\n# With llama.cpp (recommended: RYS-L for best balance)\n./llama-server -m Ornstein-27B-SABER-RYS-L-Q4_K_M.gguf \\\n --host 0.0.0.0 --port 8080 --n-gpu-layers 99 \\\n --ctx-size 131072 --flash-attn on --jinja \\\n -ctk q4_0 -ctv q4_0\n```\n\n**Recommended**: Start with **RYS-L** (layers 30-34 duplicated) for the best balance of reasoning improvement and overhead. Use **RYS-S** if you're VRAM-constrained.\n\n---\n\n## Complementary Design\n\nSABER and RYS target fundamentally different aspects of the model:\n\n| | SABER | RYS |\n|:-:|:-----:|:---:|\n| **Target** | Refusal circuit | Reasoning circuit |\n| **Mechanism** | Direction ablation | Layer duplication |\n| **Modifies weights** | Yes (orthogonal projections) | No (virtual copies) |\n| **VRAM cost** | Negligible | Extra KV cache + compute |\n| **Effect** | Removes refusals | Improves reasoning/EQ |\n| **Risk** | Capability entanglement | Junction discontinuity |\n\nBoth are applied to the same base architecture (Qwen3.5-27B) and are architecturally compatible — SABER cleans the refusal subspace, RYS amplifies the reasoning subspace.\n\n---\n\n## Capability Evaluation\n\nPerplexity was evaluated on a diverse 100-prompt battery spanning five categories:\n\n- **Arithmetic** (20): multi-step calculation, algebra, word problems\n- **Logic** (20): syllogisms, conditional reasoning, puzzle solving\n- **Code** (20): function implementation, debugging, execution tracing\n- **Instruction Following** (20): constrained formatting, multi-step instructions\n- **Factual Recall** (20): geography, history, science, general knowledge\n\nThis diverse evaluation ensures the entanglement analysis captures capability across **all** reasoning modalities, not just a narrow slice.\n\n---\n\n## Intended Use\n\nThis model is released for research purposes. It demonstrates that safety refusal can be surgically removed from a 27B multimodal model without degrading its capabilities, and that reasoning can be further enhanced through layer duplication — a finding with implications for both AI safety research and alignment.\n\n## Warning\n\n⚠️ This model will comply with any request, including harmful ones. It is intended solely for research into alignment, safety, and model behavior.\n\n---\n\n## Acknowledgments\n\nThe **RYS (Repeat Your Self)** layer-duplication method was discovered and developed by **David Noel Ng** ([@dnhkng](https://github.com/dnhkng)). The Pareto-optimal configurations for Qwen3.5-27B, the Math/EQ probes, the XGBoost surrogate pipeline, and the beam search methodology are all from his work. The GGUF surgery tools used to create these models are from [alainnothere/llm-circuit-finder](https://github.com/alainnothere/llm-circuit-finder), an open-source (MIT) implementation of the RYS technique for llama.cpp.\n\n**If you use these models, please cite David Noel Ng's work:**\n\n> Ng, David Noel. \"LLM Neuroanatomy: How I Topped the Leaderboard Without Changing a Single Weight.\" [dnhkng.github.io/posts/rys](https://dnhkng.github.io/posts/rys/)\n>\n> Ng, David Noel. \"LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language.\" [dnhkng.github.io/posts/rys-ii](https://dnhkng.github.io/posts/rys-ii/)\n\nThe **SABER** refusal-ablation method is original to this model.\n\n## References\n\n- [LLM Neuroanatomy — Part I](https://dnhkng.github.io/posts/rys/) — David Noel Ng\n- [LLM Neuroanatomy — Part II](https://dnhkng.github.io/posts/rys-ii/) — David Noel Ng (Qwen3.5-27B sweep)\n- [alainnothere/llm-circuit-finder](https://github.com/alainnothere/llm-circuit-finder) — GGUF surgery tools (MIT)\n- [XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF](https://huggingface.co/XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF) — Reference RYS model with BFCLv4 benchmarks",
"related_quantizations": []
},
"tags": [
"gguf",
"refusal-ablation",
"capability-preserving",
"saber",
"rys",
"layer-surgery",
"qwen3.5",
"multimodal",
"27b",
"text-generation",
"base_model:DJLougen/Ornstein-27B",
"base_model:quantized:DJLougen/Ornstein-27B",
"license:other",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 9,
"downloads": 469,
"gated": false,
"private": false,
"last_modified": "2026-04-15T20:38:34.000Z",
"created_at": "2026-04-15T20:01:37.000Z",
"pipeline_tag": "text-generation",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69dfeea1056b0fb56e356208",
"id": "DJLougen/Ornstein-27B-SABER-RYS-GGUF",
"modelId": "DJLougen/Ornstein-27B-SABER-RYS-GGUF",
"sha": "4c57d17d21b29f1d1062a426a6514c29bc1f854d",
"createdAt": "2026-04-15T20:01:37.000Z",
"lastModified": "2026-04-15T20:38:34.000Z",
"author": "DJLougen",
"downloads": 469,
"likes": 9,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "",
"siblings_count": 12
}