GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

zTrojan/Qwen3.5-122B-A10B-REAP20-APEX-GGUF overview

Qwen3.5 122B A10B REAP20 APEX GGUF Physical REAP20 pruned APEX GGUF builds of Qwen/Qwen3.5 122B A10B . This repository is focused on APEX quantizations of a ph…

llama.cppggufqwenqwen3.5moereapreap20apexapex-miniapex-quantquantizedtext-generationenukbase_model:Qwen/Qwen3.5-122B-A10Bbase_model:quantized:Qwen/Qwen3.5-122B-A10Blicense:otherendpoints_compatibleregion:usimatrixconversational

Runs locally from ~34.63 GB disk (32 GB+ VRAM class GPUs with llama.cpp / guIDE).

Downloads
1
Likes
0
Pipeline
text-generation
Author

Repository Files & Downloads

2 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Qwen3.5-122B-A10B-REAP20-APEX-I-Compact.ggufGGUFGGUF43.18 GBDownload
Qwen3.5-122B-A10B-REAP20-APEX-Mini.ggufGGUFGGUF34.63 GBDownload

Model Details

Model IDzTrojan/Qwen3.5-122B-A10B-REAP20-APEX-GGUF
AuthorzTrojan
Pipelinetext-generation
Licenseother
Base modelQwen/Qwen3.5-122B-A10B
Last modified2026-06-12T06:55:21.000Z

Model README

---

license: other

base_model: Qwen/Qwen3.5-122B-A10B

language:

  • en
  • uk

library_name: llama.cpp

pipeline_tag: text-generation

tags:

  • gguf
  • qwen
  • qwen3.5
  • moe
  • reap
  • reap20
  • apex
  • apex-mini
  • apex-quant
  • llama.cpp
  • quantized

---

Qwen3.5-122B-A10B REAP20 APEX GGUF

Physical REAP20-pruned APEX GGUF builds of Qwen/Qwen3.5-122B-A10B.

This repository is focused on APEX quantizations of a physically expert-pruned REAP20 version of Qwen3.5-122B-A10B.

Available quants

| File | Description |

|---|---|

| Qwen3.5-122B-A10B-REAP20-APEX-Mini.gguf | Production APEX Mini quant with imatrix |

Source model

Base model:

Qwen/Qwen3.5-122B-A10B
REAP pruning
Method: physical REAP expert pruning
Compression ratio: 20%
Original experts per MoE layer: 256
Retained experts per MoE layer: 205
Layers: 48
Experts per token: 8
Observation: 32 batches
Distance: cosine
Seed: 42
Calibration source: custom calibration-v2
APEX Mini quantization

APEX Mini was generated with llama.cpp llama-quantize using a tensor-type map and imatrix.

Tensor config: configs/qwen35_122b_mini.txt
Base fallback type: Q3_K_M
Imatrix: production REAP20 imatrix
Imatrix context: -c 4096
Imatrix chunks: --chunks 128
Imatrix calibration: 64MB shuffled calibration-v2
Imatrix model source: REAP20 BF16 GGUF

The Mini profile uses Q3_K edge experts and IQ2_S middle routed experts, so imatrix is required.

Example: llama.cpp
llama-cli \
  -m Qwen3.5-122B-A10B-REAP20-APEX-Mini.gguf \
  -p "<|im_start|>user
Привіт. Напиши один короткий параграф українською. /no_think
<|im_end|>
<|im_start|>assistant
" \
  -n 128 \
  -c 4096 \
  -ngl 40 \
  --temp 0.6 \
  --top-p 0.95
Notes

This is an experimental physical expert-pruned build. It is intended for testing, iteration, and comparison with other REAP/APEX variants.

Future files may include:

APEX I-Compact
APEX I-Balanced
APEX I-Quality
other GGUF quantizations

Run zTrojan/Qwen3.5-122B-A10B-REAP20-APEX-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models