GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

Radamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF overview

Qwen 3.6 35B A3B DFlash GGUF GGUF made to use in ikawrakow/ik llama.cpp https://github.com/ikawrakow/ik llama.cpp , currently for PR 1970 https://github.com/ik…

ggufdflashik_llama.cpptext-generationbase_model:z-lab/Qwen3.6-35B-A3B-DFlashbase_model:quantized:z-lab/Qwen3.6-35B-A3B-DFlashendpoints_compatibleregion:usconversational

Runs locally from ~278.2 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation

Repository Files & Downloads

3 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
qwen36-35b-a3b-dflash-F16.ggufGGUFF16914.6 MBDownload
qwen36-35b-a3b-dflash-Q4_K_M.ggufGGUFQ4_K_M278.2 MBDownload
qwen36-35b-a3b-dflash-Q8_0.ggufGGUFQ8_0490.8 MBDownload

Model Details

Model IDRadamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF
AuthorRadamanthys11
Pipelinetext-generation
License
Base modelz-lab/Qwen3.6-35B-A3B-DFlash
Last modified2026-06-15T01:41:09.000Z

Model README

---

base_model: z-lab/Qwen3.6-35B-A3B-DFlash

tags:

  • gguf
  • dflash
  • ik_llama.cpp

library_name: gguf

pipeline_tag: text-generation

---

Qwen 3.6 35B A3B DFlash GGUF

GGUF made to use in ikawrakow/ik_llama.cpp, currently for PR #1970. The small quantizations delivered here are made for test purposes; feel free to create your own quantization.

Derived from the safetensors DFlash draft model z-lab/Qwen3.6-35B-A3B-DFlash.

Compatible target model

  • Qwen3.6-35B-A3B-UD.gguf - Mainly tested with Q4_K_M.

Files

| File | Quant | Size |

|---|---|---|

| qwen36-35b-a3b-dflash-F16.gguf | F16 | 915 MB |

| qwen36-35b-a3b-dflash-Q8_0.gguf | Q8_0 | 491 MB |

| qwen36-35b-a3b-dflash-Q4_K_M.gguf | Q4_K_M | 279 MB |

Usage

./build/bin/llama-server \
  --model <target.gguf> \
  --model-draft <draft.gguf> \
  --spec-type dflash:n_max=<N>,cross_ctx=<N> ...

Notes

  • This repo contains DFlash draft models, not a standalone instruct model.
  • Use it with the matching target family listed above.
  • Q4_K_M and Q8_0 are small test-oriented quants; create your own quant if you need a different tradeoff.

Run Radamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models