GraySoft
Projects Models About FAQ Contact Download guIDE →
Model Intelligence Sheet

openmose/qwen3.5-reap-212b-a17b-gguf overview

Comprehensive model page for openmose/qwen3.5-reap-212b-a17b-gguf

ggufbase_model:OpenMOSE/Qwen3.5-REAP-212B-A17Bbase_model:quantized:OpenMOSE/Qwen3.5-REAP-212B-A17Blicense:apache-2.0endpoints_compatibleregion:usimatrixconversational
openmose/qwen3.5-reap-212b-a17b-gguf visual
Downloads
916
Likes
6
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

24 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen3.5-REAP-212B-A17B-IQ2_M-00001-of-00003.gguf GGUF IQ2_M 29.79 GB Download
Qwen3.5-REAP-212B-A17B-IQ2_M-00002-of-00003.gguf GGUF IQ2_M 29.61 GB Download
Qwen3.5-REAP-212B-A17B-IQ2_M-00003-of-00003.gguf GGUF IQ2_M 4.86 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_M-00001-of-00003.gguf GGUF IQ3_M 29.37 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_M-00002-of-00003.gguf GGUF IQ3_M 29.50 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_M-00003-of-00003.gguf GGUF IQ3_M 27.62 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_XS-00001-of-00003.gguf GGUF IQ3_XS 29.68 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_XS-00002-of-00003.gguf GGUF IQ3_XS 29.47 GB Download
Qwen3.5-REAP-212B-A17B-IQ3_XS-00003-of-00003.gguf GGUF IQ3_XS 21.56 GB Download
Qwen3.5-REAP-212B-A17B-IQ4_XS-00001-of-00004.gguf GGUF IQ4_XS 29.56 GB Download
Qwen3.5-REAP-212B-A17B-IQ4_XS-00002-of-00004.gguf GGUF IQ4_XS 29.40 GB Download
Qwen3.5-REAP-212B-A17B-IQ4_XS-00003-of-00004.gguf GGUF IQ4_XS 29.40 GB Download
Qwen3.5-REAP-212B-A17B-IQ4_XS-00004-of-00004.gguf GGUF IQ4_XS 16.69 GB Download
Qwen3.5-REAP-212B-A17B-Q4_K_M-00001-of-00005.gguf GGUF Q4_K_M 29.50 GB Download
Qwen3.5-REAP-212B-A17B-Q4_K_M-00002-of-00005.gguf GGUF Q4_K_M 29.47 GB Download
Qwen3.5-REAP-212B-A17B-Q4_K_M-00003-of-00005.gguf GGUF Q4_K_M 29.47 GB Download
Qwen3.5-REAP-212B-A17B-Q4_K_M-00004-of-00005.gguf GGUF Q4_K_M 29.06 GB Download
Qwen3.5-REAP-212B-A17B-Q4_K_M-00005-of-00005.gguf GGUF Q4_K_M 2.04 GB Download
Qwen3.5-REAP-212B-A17B-Q5_K_M-00001-of-00005.gguf GGUF Q5_K_M 29.51 GB Download
Qwen3.5-REAP-212B-A17B-Q5_K_M-00002-of-00005.gguf GGUF Q5_K_M 29.80 GB Download
Qwen3.5-REAP-212B-A17B-Q5_K_M-00003-of-00005.gguf GGUF Q5_K_M 29.74 GB Download
Qwen3.5-REAP-212B-A17B-Q5_K_M-00004-of-00005.gguf GGUF Q5_K_M 29.71 GB Download
Qwen3.5-REAP-212B-A17B-Q5_K_M-00005-of-00005.gguf GGUF Q5_K_M 21.35 GB Download
Qwen3.5-REAP-218B-A17B-mmproj.gguf GGUF 879.01 MB Download

Model Details Live

Model Slug
openmose/qwen3.5-reap-212b-a17b-gguf
Author
OpenMOSE
Pipeline Task
Library
Created
2026-02-24
Last Modified
2026-02-26
Gated
No
Private
No
HF SHA
64a46ead7b16d0261ccba15abfcbabbb2f96692b
License
apache-2.0
Language
Unknown
Base Model
OpenMOSE/Qwen3.5-REAP-212B-A17B

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "base_model": [
      "OpenMOSE/Qwen3.5-REAP-212B-A17B"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "base_model": [
        "OpenMOSE/Qwen3.5-REAP-212B-A17B"
      ]
    },
    "hero_image_url": "",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nbase_model:\n- OpenMOSE/Qwen3.5-REAP-212B-A17B\n---\n\n## OpenMOSE/Qwen3.5-REAP-212B-A17B-GGUF\n\nVision–Language MoE model created by applying **Router-weighted Expert Activation Pruning (REAP)** to **Qwen3.5-397B-A17B**.\n\n\nNow, 35% Version also available :) more strong, more stable\n\nhttps://huggingface.co/OpenMOSE/Qwen3.5-REAP-262B-A17B-GGUF\n\n\n---\n\n### 1. Model Summary\n\n* **Base model:** Qwen/Qwen3.5-397B-A17B (vision–language MoE LLM)\n* **Variant name:** Qwen3.5-REAP-212B-A17B\n* **Architecture:** Decoder-only Transformer + MoE MLP experts, with vision encoder + VL fusion as in Qwen3.5\n* **Pruning method:** REAP (Router-weighted Expert Activation Pruning) by Cerebras Research, 3072 samples\n  [https://github.com/CerebrasResearch/reap](https://github.com/CerebrasResearch/reap)\n* **Expert sparsity:** ~**48% of MoE experts pruned globally** (512 → 267 experts)\n* **Active parameters:** \"A17B\" indicates roughly ~17B active parameters per token (MoE sparse activation), while total parameters are reduced to about **212B**\n* **Modality:** Text + Vision (VL support **kept intact**)\n* **License:** **Apache 2.0**\n* **Author / Maintainer:** **OpenMOSE**\n* **Year:** 2025\n\nThis is an **unofficial community variant** of Qwen3.5, not affiliated with or endorsed by Alibaba or Cerebras Systems.\n\n---\n\n### 2. What Is REAP and What Did We Change?\n\n**REAP (Router-weighted Expert Activation Pruning)** is a pruning method for MoE models that uses:\n\n* Router statistics (routing probabilities)\n* Expert activation patterns on a calibration set\n\nto identify **under-used or redundant experts** and prune them while preserving model quality as much as possible.\n\nFor this model:\n\n* We applied REAP to **Qwen3.5-397B-A17B** across its MoE MLP blocks.\n* **~48% of experts** are pruned (512 → 267), based on router-weighted activation statistics.\n* The **routing mechanism itself is not conceptually changed**; we only changed which experts remain.\n* We extended the original REAP implementation to **support the Qwen3.5 architecture**, including its hybrid linear/full attention layers and vision components, so pruning can be applied without breaking VL functionality.\n\nIn short: **same REAP algorithm, adapted to Qwen3.5, leaving VL functionality available.**\n\n---\n\n### 3. Calibration Data\n\nThe REAP pruning statistics were computed using:\n\n* **Calibration dataset:** [https://huggingface.co/datasets/OpenMOSE/reap-calib-mix](https://huggingface.co/datasets/OpenMOSE/reap-calib-mix)\n* This dataset is **mostly synthetic**, generated by **Qwen3-235B-Instruct** on mixed prompts designed to cover:\n\n  * General instruction-following\n  * Reasoning and long-form text\n\nThe calibration set is **not** used for additional fine-tuning; it is used only to measure **router/expert activations** to decide which experts to prune.\n\n---\n\n### 4. Why 212B-A17B? (Motivation & Hardware Footprint)\n\nBy pruning ~48% of experts while keeping VL:\n\n* The model shrinks from ~397B total parameters to about **212B total parameters**.\n* With sparse MoE activation, around **17B parameters are active per token** (\"A17B\").\n* In practice, this makes it feasible to **deploy on a single 96 GB GPU** with a small amount of CPU offload.\n\nQwen3.5-397B-A17B is currently **the closest OSS model to frontier performance**. The goal of this project is to make that model accessible for local deployment by reducing its memory footprint as much as possible without sacrificing the core capabilities that make it special.\n\n---\n\n### 5. Intended Use\n\n**Primary intended uses**\n\n* Research on:\n  * MoE pruning and compression (especially REAP)\n  * Scaling behavior of pruned MoE VL models\n  * Trade-offs between expert sparsity and performance\n* Experimental deployment for:\n  * Vision–language assistants\n  * Multimodal chatbots\n  * Document + image understanding\n\n**Suitable tasks (examples)**\n\n* Multimodal chat (image + text → text)\n* Image captioning / description\n* Visual question answering\n* General instruction-following and long-form text generation\n* Reasoning and chain-of-thought tasks\n\n**Out-of-scope / high-risk uses**\n\nThis model **should not** be used **without additional safeguards** for:\n\n* Medical, legal, or financial advice\n* Safety-critical decision making\n* Political persuasion or targeted disinformation\n* Any scenario where incorrect or biased outputs can cause real-world harm\n\n---\n\n### 6. Limitations & Risks\n\nThis model inherits all the limitations of **Qwen3.5-397B-A17B** plus those introduced by pruning:\n\n* **Hallucinations:** The model can generate plausible but incorrect facts.\n* **Bias & toxicity:** Biases from the original training data and synthetic calibration data remain and may be amplified.\n* **Distribution shift from pruning:**\n  * Some long-tail behaviors may degrade due to pruning 48% of experts — a more aggressive cut than previous REAP releases.\n  * Performance may be uneven across tasks, domains, or languages not well covered in the calibration set.\n* **Multimodal edge cases:**\n  * Complex compositional visual reasoning or extremely high-resolution images may not work reliably.\n  * VL behavior is preserved but not re-tuned after pruning.\n\nUsers should perform their **own evaluation** before relying on the model in any sensitive context.\n\n---\n\n### 7. How to Use\n\nPlease check llama.cpp official github :)\n\n**Recommended quantization:** Q4_K_M or similar 4-bit quantization is recommended to fit within a single 96 GB GPU. A small amount of CPU offload may still be needed depending on your configuration.\n\n---\n\n### 8. Model Configuration Highlights\n\nKey parameters after pruning:\n\n* `num_experts`: **267** (down from 512)\n* `num_experts_per_tok`: 10\n* `num_hidden_layers`: 60\n* `hidden_size`: 4096\n* `max_position_embeddings`: 262,144\n* Architecture: hybrid **linear + full attention** (full attention every 4 layers)\n* Vision encoder: depth 27, `hidden_size` 1152, supports image and video tokens\n\n---\n\n### 9. Evaluation (Status)\n\n* This release focuses on **making the REAP-pruned model available** for the community.\n* Quantitative benchmarks (e.g., MMLU, reasoning, multimodal benchmarks) are still **work in progress**.\n* Early qualitative checks show:\n  * **VL behavior is preserved** after pruning.\n  * **Latency and memory usage** are significantly improved compared to Qwen3.5-397B-A17B, enabling single-96GB-GPU deployment.\n\nCommunity contributions with detailed benchmarks are very welcome.\n\n---\n\n### 10. Training & Distillation Details (High-Level)\n\n* **Base model:** Qwen/Qwen3.5-397B-A17B\n* **Pruning method:** REAP (Router-weighted Expert Activation Pruning)\n* **Expert count:** 512 → **267** (~48% pruned)\n* **Calibration data:** `OpenMOSE/reap-calib-mix` (mostly generated by Qwen3-235B-Instruct)\n* **Post-processing:**\n  * Router / gating structure retained\n  * Experts pruned according to REAP scoring\n  * No additional large-scale pretraining performed in this release\n\nFuture versions may include **post-pruning fine-tuning** or **distillation** to recover performance lost from the more aggressive pruning ratio.\n\n---\n\n### 11. Community & Contribution\n\n> Let's make frontier-class OSS models accessible together.\n\nYou are encouraged to:\n\n* Run benchmarks and publish results\n* Contribute scripts for:\n  * Further pruning experiments\n  * Quantization (e.g., GGUF, AWQ, GPTQ)\n  * Long-context or domain-specific fine-tuning\n* Report issues or findings about failure modes, biases, or surprising behaviors\n\n---\n\n### 12. License\n\n* **Model & code (this repository):** **Apache License 2.0**\n* The original Qwen3.5-397B-A17B model and any downstream use must also respect their respective licenses and usage terms.\n\n---\n\n### 13. Acknowledgements\n\nThis architecture research and implementation was made possible with computing power and technical support from Recursal AI. We sincerely thank them for enabling this work.\n\nhttps://featherless.ai/\n\n\n* **Qwen team** for building the Qwen3.5 family of models.\n* **Cerebras Research** for the REAP method and reference implementation:\n  [https://github.com/CerebrasResearch/reap](https://github.com/CerebrasResearch/reap)\n* **OpenMOSE community** for experimentation, engineering, and calibration data generation.\n\n---",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "base_model:OpenMOSE/Qwen3.5-REAP-212B-A17B",
    "base_model:quantized:OpenMOSE/Qwen3.5-REAP-212B-A17B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 6,
  "downloads": 916,
  "gated": false,
  "private": false,
  "last_modified": "2026-02-26T09:36:56.000Z",
  "created_at": "2026-02-24T10:13:02.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "699d79ae2b8317e9175d0daa",
  "id": "OpenMOSE/Qwen3.5-REAP-212B-A17B-GGUF",
  "modelId": "OpenMOSE/Qwen3.5-REAP-212B-A17B-GGUF",
  "sha": "64a46ead7b16d0261ccba15abfcbabbb2f96692b",
  "createdAt": "2026-02-24T10:13:02.000Z",
  "lastModified": "2026-02-26T09:36:56.000Z",
  "author": "OpenMOSE",
  "downloads": 916,
  "likes": 6,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 26
}