Model Intelligence Sheet
xpressai/qwen3.5-27b-rys-ud-q4_k_xl-gguf overview
Two modified versions of Qwen3.5-27B produced by RYS layer duplication — no training, no weight changes, just routing hidden states through a specific circuit twice. Based on David Ng's RYS method. ---
Downloads
5,337
Likes
2
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"base_model": "Qwen/Qwen3.5-27B",
"tags": [
"gguf",
"qwen3.5",
"rys",
"layer-surgery",
"reasoning",
"mamba",
"hybrid"
],
"language": [
"en"
],
"frontmatter": {
"license": "apache-2.0",
"base_model": "Qwen/Qwen3.5-27B",
"tags": [
"gguf",
"qwen3.5",
"rys",
"layer-surgery",
"reasoning",
"mamba",
"hybrid"
],
"language": [
"en"
]
},
"hero_image_url": "",
"summary": "Two modified versions of Qwen3.5-27B produced by **RYS layer duplication** — no training, no weight changes, just routing hidden states through a specific circuit twice. Based on David Ng's RYS method. ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: apache-2.0\nbase_model: Qwen/Qwen3.5-27B\ntags:\n - gguf\n - qwen3.5\n - rys\n - layer-surgery\n - reasoning\n - mamba\n - hybrid\nlanguage:\n - en\n---\n\n# Qwen3.5-27B — RYS Layer Surgery (GGUF)\n\nTwo modified versions of [Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) produced by\n**RYS layer duplication** — no training, no weight changes, just routing hidden states through a specific circuit twice.\n\nBased on [David Ng's RYS method](https://dnhkng.github.io/posts/rys/).\n\n---\n\n## Files\n\n| File | Layers | Size |\n|------|--------|------|\n| `Qwen3.5-27B-UD-Q4_K_XL.gguf` | 64 | 17 GiB |\n| `Qwen3.5-27B-rys_30-33-UD-Q4_K_XL.gguf` | 68 | 21 GiB |\n| `Qwen3.5-27B-rys_34-37_eq-UD-Q4_K_XL.gguf` | 68 | 21 GiB |\n\n---\n\n## Probe scores\n\nScores from an internal sweep benchmark run during circuit search. Sample sizes are small — treat these as directional indicators, not definitive benchmarks.\n\n| Model | Math | EQ | Reasoning | Logic |\n|-------|------|----|-----------|-------|\n| Base (64 layers) | 0.375 | 11.5 | 0.000 | 0.00 |\n| rys_30-33 (68 layers) | 0.438 | 29.5 | **0.353** | **1.00** |\n| rys_34-37 (68 layers) | 0.375 | **39.4** | 0.000 | 0.00 |\n\n- **Math**: Ng's partial-credit scoring on a small GSM8K sample\n- **EQ**: EQ-Bench-style emotional intelligence score (0–100)\n- **Reasoning**: fraction correct across causal, date, logic, navigation, and GSM8K probes\n- **Logic**: fraction correct on logical deduction probes only\n\nrys_30-33 shows the best combined improvement across reasoning categories. rys_34-37 shows the highest EQ score but no reasoning improvement over baseline.\n\n---\n\n# Benchmarks (based on BFCLv4)\n\n## Non-Live Tests\n\n| Task | Qwen3.5-27B-RYS-30-34 (Δ vs Best) | Qwen3.5-27B-FC (Baseline) | Claude Opus 4.5 (FC) | Claude Sonnet 4.5 (FC) | GLM 4.6 (FC) | Grok-4 (FC) | GPT-5.2 (FC) |\n|------|-----------------------------------|--------------------------|---------------------|-----------------------|--------------|-------------|--------------|\n| irrelevance | 86.67% (-1.25%) | 87.50% | 85.83% | **87.92%** | 85.42% | 77.50% | 80.00% |\n| multiple | **96.50%** | 96.50% | 95.50% | 95.50% | 95.00% | 92.50% | 88.00% |\n| parallel | **95.00%** | 93.00% | 93.50% | 94.50% | 91.50% | 88.50% | 89.00% |\n| parallel_multiple | 91.50% (-0.50%) | 76.00% | 88.50% | **92.00%** | 89.50% | 87.00% | 77.50% |\n| simple_java | 62.00% (-3.00%) | **65.00%** | 60.00% | 62.00% | 64.00% | 62.00% | 62.00% |\n| simple_javascript | 72.00% (-2.00%) | 66.00% | **74.00%** | 58.00% | 64.00% | 66.00% | 64.00% |\n| simple_python | 95.25% (-2.50%) | 95.00% | 96.50% | **97.75%** | 94.75% | 92.50% | 92.75% |\n\n## Live Tests\n\n| Task | Qwen3.5-27B-RYS-30-34 (Δ vs Best) | Qwen3.5-27B-FC (Baseline) | Claude Opus 4.5 (FC) | Claude Sonnet 4.5 (FC) | GLM 4.6 (FC) | Grok-4 (FC) | GPT-5.2 (FC) |\n|------|-----------------------------------|--------------------------|---------------------|-----------------------|--------------|-------------|--------------|\n| live_irrelevance | 82.24% (-3.05%) | 80.88% | 83.60% | **85.29%** | 84.50% | 73.30% | 78.85% |\n| live_multiple | 79.68% (-1.14%) | **80.82%** | 78.16% | 78.92% | 78.92% | 73.88% | 70.37% |\n| live_parallel | 81.25% (-6.25%) | **87.50%** | **87.50%** | **87.50%** | 81.25% | 75.00% | 68.75% |\n| live_parallel_multiple | 75.00% (-8.33%) | 79.17% | 75.00% | **83.33%** | 75.00% | 79.17% | 58.33% |\n| live_relevance | 81.25% (-6.25%) | 68.75% | 62.50% | 68.75% | 75.00% | **87.50%** | 75.00% |\n| live_simple | 84.50% (-5.03%) | 87.60% | 86.43% | **89.53%** | **89.53%** | 82.17% | 71.71% |\n\n## Multi-Turn Tests\n\n| Task | Qwen3.5-27B-RYS-30-34 (Δ vs Best) | Qwen3.5-27B-FC (Baseline) | Claude Opus 4.5 (FC) | Claude Sonnet 4.5 (FC) | GLM 4.6 (FC) | Grok-4 (FC) | GPT-5.2 (FC) |\n|------|-----------------------------------|--------------------------|---------------------|-----------------------|--------------|-------------|--------------|\n| multi_turn_base | 74.50% (-6.50%) | 70.50% | **81.00%** | 69.00% | 74.50% | 44.00% | 36.50% |\n| multi_turn_long_context | 67.50% (-3.00%) | 59.00% | **70.50%** | 59.00% | 66.50% | 44.00% | 30.50% |\n\n## Memory Tests (Agentic)\n\n| Task | Qwen3.5-27B-RYS-30-34 (Δ vs Best) | Qwen3.5-27B-FC (Baseline) | Claude Opus 4.5 (FC) | Claude Sonnet 4.5 (FC) | GLM 4.6 (FC) | Grok-4 (FC) | GPT-5.2 (FC) |\n|------|-----------------------------------|--------------------------|---------------------|-----------------------|--------------|-------------|--------------|\n| memory_kv | 45.81% (-25.16%) | N/A | **70.97%** | 54.19% | 43.87% | 57.42% | 33.55% |\n| memory_rec_sum | 70.97% (-12.26%) | N/A | 77.42% | **83.23%** | 67.10% | 51.61% | 60.65% |\n| memory_vector | 63.23% (-9.67%) | N/A | **72.90%** | 57.42% | 56.13% | 58.71% | 43.23% |\n\n## RYS vs Baseline Comparison (All Tests)\n\n| Task | RYS | Baseline | Δ (RYS - Baseline) |\n|------|-----|----------|-------------------|\n| irrelevance | 86.67% | 87.50% | **-0.83%** |\n| multiple | 96.50% | 96.50% | **0.00%** |\n| parallel | 95.00% | 93.00% | **+2.00%** ✅ |\n| parallel_multiple | 91.50% | 76.00% | **+15.50%** ✅ |\n| simple_java | 62.00% | 65.00% | **-3.00%** |\n| simple_javascript | 72.00% | 66.00% | **+6.00%** ✅ |\n| simple_python | 95.25% | 95.00% | **+0.25%** |\n| live_irrelevance | 82.24% | 80.88% | **+1.36%** ✅ |\n| live_multiple | 79.68% | 80.82% | **-1.14%** |\n| live_parallel | 81.25% | 87.50% | **-6.25%** |\n| live_parallel_multiple | 75.00% | 79.17% | **-4.17%** |\n| live_relevance | 81.25% | 68.75% | **+12.50%** ✅ |\n| live_simple | 84.50% | 87.60% | **-3.10%** |\n| multi_turn_base | 74.50% | 70.50% | **+4.00%** ✅ |\n| multi_turn_long_context | 67.50% | 59.00% | **+8.50%** ✅ |\n| memory_kv | 45.81% | N/A | ✅ |\n| memory_rec_sum | 70.97% | N/A | ✅ |\n| memory_vector | 63.23% | N/A | ✅ |\n\n\n---\n\n## What is RYS?\n\nTransformers self-organise during training into functional **circuits** — contiguous blocks of layers that act together. The RYS technique duplicates a specific block in the forward pass using the same weights, with no extra copies on disk beyond the GGUF file overhead:\n\n```\nNormal: 0 → 1 → … → 29 → 30 → 31 → 32 → 33 → 34 → … → 63\nrys_30-33: 0 → 1 → … → 29 → 30 → 31 → 32 → 33 → 30 → 31 → 32 → 33 → 34 → … → 63\n```\n\nThe model processes the same circuit twice, without any weight changes or fine-tuning.\n\n---\n\n## Hybrid Mamba/attention architecture constraint\n\nQwen3.5-27B is a **hybrid SSM/attention model** (`full_attention_interval = 4`): full attention every 4th layer, Mamba SSM everywhere else.\n\nThis creates a hard constraint on layer surgery: **the total layer count must remain divisible by 4**.\n\n- Block size 4 → 64 + 4 = 68 layers (68 ÷ 4 = 17 ✓)\n- Block size 3 → 64 + 3 = 67 layers (67 ÷ 4 = 16.75 ✗ → server crash at load)\n- Block size 8 → 64 + 8 = 72 layers (72 ÷ 4 = 18 ✓)\n\nOnly multiples of 4 work as block sizes for this model family.\n\n---\n\n## How the circuit was found\n\nA two-pass sweep over the 64-layer model using a probe benchmark:\n\n**Pass 1** — 8-layer blocks, stride 4, layers 4–60:\n- Identified hot zones at layers 8–16 (reasoning) and 28–40 (EQ/math)\n\n**Pass 2** — 4-layer blocks, stride 1, within each hot zone:\n- `(30, 34)` achieved the best combined score: reasoning=0.353, EQ=29.5, logic=1.0\n- `(34, 38)` achieved the highest EQ score: EQ=39.4\n\nEach configuration was tested by patching the GGUF layer path, loading with llama-server, and scoring with the probe suite.\n\n---\n\n## Usage\n\n### llama.cpp / llama-server\n\n```bash\nllama-server -m Qwen3.5-27B-rys_30-33.gguf -ngl 99 --port 8080\n```\n\n### Thinking mode\n\nQwen3.5 defaults to thinking mode (`<think>…</think>`). Add `/no_think` to the system prompt for fast, direct answers:\n\n```python\nmessages = [\n {\"role\": \"system\", \"content\": \"/no_think\"},\n {\"role\": \"user\", \"content\": \"Your question here\"}\n]\n```\n\n### VRAM requirements\n\nThe model weights alone are ~21 GiB (Q4_K_XL quantization, 68 layers). A single A100 80GB or H100 runs this comfortably. Consumer GPU setups depend on your llama.cpp version's tensor split support.\n\n---\n\n## Credits\n\n- [David Ng](https://dnhkng.github.io/posts/rys/) for the original RYS method\n- [Unsloth](https://huggingface.co/unsloth) for the base `Q4_K_XL` GGUF quantization\n- [Qwen team](https://huggingface.co/Qwen) for Qwen3.5-27B\n- [llama.cpp](https://github.com/ggml-org/llama.cpp) for local inference\n\n## License\n\nApache 2.0 (inherited from base model)\n",
"related_quantizations": []
},
"tags": [
"gguf",
"qwen3.5",
"rys",
"layer-surgery",
"reasoning",
"mamba",
"hybrid",
"en",
"base_model:Qwen/Qwen3.5-27B",
"base_model:quantized:Qwen/Qwen3.5-27B",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 2,
"downloads": 5337,
"gated": false,
"private": false,
"last_modified": "2026-04-16T00:37:24.000Z",
"created_at": "2026-03-26T11:55:20.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69c51ea85ee2ee57e0359762",
"id": "XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF",
"modelId": "XpressAI/Qwen3.5-27B-RYS-UD-Q4_K_XL-GGUF",
"sha": "d7e7c78e0a88277f4aa5b77341fa6b29284f12b8",
"createdAt": "2026-03-26T11:55:20.000Z",
"lastModified": "2026-04-16T00:37:24.000Z",
"author": "XpressAI",
"downloads": 5337,
"likes": 2,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 4
}