Radamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF overview
Qwen 3.6 35B A3B DFlash GGUF GGUF made to use in ikawrakow/ik llama.cpp https://github.com/ikawrakow/ik llama.cpp , currently for PR 1970 https://github.com/ik…
Runs locally from ~278.2 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | Radamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF |
|---|---|
| Author | Radamanthys11 |
| Pipeline | text-generation |
| License | — |
| Base model | z-lab/Qwen3.6-35B-A3B-DFlash |
| Last modified | 2026-06-15T01:41:09.000Z |
Model README
---
base_model: z-lab/Qwen3.6-35B-A3B-DFlash
tags:
- gguf
- dflash
- ik_llama.cpp
library_name: gguf
pipeline_tag: text-generation
---
Qwen 3.6 35B A3B DFlash GGUF
GGUF made to use in ikawrakow/ik_llama.cpp, currently for PR #1970. The small quantizations delivered here are made for test purposes; feel free to create your own quantization.
Derived from the safetensors DFlash draft model z-lab/Qwen3.6-35B-A3B-DFlash.
Compatible target model
Qwen3.6-35B-A3B-UD.gguf- Mainly tested with Q4_K_M.
Files
| File | Quant | Size |
|---|---|---|
| qwen36-35b-a3b-dflash-F16.gguf | F16 | 915 MB |
| qwen36-35b-a3b-dflash-Q8_0.gguf | Q8_0 | 491 MB |
| qwen36-35b-a3b-dflash-Q4_K_M.gguf | Q4_K_M | 279 MB |
Usage
./build/bin/llama-server \
--model <target.gguf> \
--model-draft <draft.gguf> \
--spec-type dflash:n_max=<N>,cross_ctx=<N> ...
Notes
- This repo contains DFlash draft models, not a standalone instruct model.
- Use it with the matching target family listed above.
Q4_K_MandQ8_0are small test-oriented quants; create your own quant if you need a different tradeoff.
Run Radamanthys11/Qwen3.6-35B-A3B-DFlash-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models