NobodyWho/Qwen_Qwen3.5-0.8B-GGUF overview
NobodyWho/Qwen Qwen3.5 0.8B GGUF Overview GGUF quantization of Qwen3.5 0.8B , prepared for NobodyWho https://github.com/nobodywho ooo/nobodywho : it works with…
Runs locally from ~197.7 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | NobodyWho/Qwen_Qwen3.5-0.8B-GGUF |
|---|---|
| Author | NobodyWho |
| Pipeline | image-text-to-text |
| License | apache-2.0 |
| Base model | Qwen/Qwen3.5-0.8B |
| Last modified | 2026-06-16T04:39:39.000Z |
Model README
---
license: apache-2.0
base_model: Qwen/Qwen3.5-0.8B
tags:
- gguf
- nobodywho
- tool-calling
- vision
- qwen
pipeline_tag: image-text-to-text
library_name: gguf
---
NobodyWho/Qwen_Qwen3.5-0.8B-GGUF
Overview
GGUF quantization of Qwen3.5-0.8B, prepared for
NobodyWho: it works with NobodyWho out
of the box, with Qwen's recommended sampling metadata embedded in every quant, and is verified
with NobodyWho's test suite. The smallest Qwen3.5 model — natively multimodal
(text + image), ideal for fast, low-memory on-device chat and tool calling.
Model Capabilities
- Text generation — instruction-following chat
- Tool calling — native function calling with grammar-constrained output (13/14 on NobodyWho's suite)
- Vision — image understanding via the companion
mmproj-BF16.ggufprojection model - Reasoning — thinking mode (on by default)
- Long context — up to 256k tokens
- Multilingual — broad language coverage
Available Quantizations
| File | Approach | Tool-calling tests |
|------|----------|--------------------|
| Qwen_Qwen3.5-0.8B-BF16-vendor-sampling.gguf | Vendor sampling injected | 13/14 |
| Qwen_Qwen3.5-0.8B-Q8_0-vendor-sampling.gguf | Vendor sampling injected | 13/14 |
| Qwen_Qwen3.5-0.8B-Q4_K_M-vendor-sampling.gguf | Vendor sampling injected | 13/14 |
| mmproj-BF16.gguf | Vision projection (use with any of the above) | — |
> Verified with NobodyWho's tool-calling suite across BF16 / Q8_0 / Q4_K_M (13/14 each, June 2026
> — smallest model, the same one complex-schema miss at every precision); vision, reasoning, and
> multilingual verified per-model. The upstream GGUF has no general.sampling.* metadata, so the
> -vendor-sampling files embed Qwen's recommended sampler (see INJECTION.md).
Quick Start
Using the NobodyWho library:
from nobodywho import Chat
chat = Chat("huggingface:NobodyWho/Qwen_Qwen3.5-0.8B-GGUF/Qwen_Qwen3.5-0.8B-Q4_K_M-vendor-sampling.gguf")
response = chat.ask("What is the capital of Denmark?").completed()
print(response) # The capital of Denmark is Copenhagen.
Vision
from nobodywho import Model, Chat, Prompt, Image, Text
model = Model(
"huggingface:NobodyWho/Qwen_Qwen3.5-0.8B-GGUF/Qwen_Qwen3.5-0.8B-Q4_K_M-vendor-sampling.gguf",
projection_model_path="huggingface:NobodyWho/Qwen_Qwen3.5-0.8B-GGUF/mmproj-BF16.gguf",
)
chat = Chat(model=model, system_prompt="You are a helpful assistant.")
response = chat.ask(Prompt([
Text("What is in this image?"),
Image("./photo.png"),
])).completed()
print(response)
llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="NobodyWho/Qwen_Qwen3.5-0.8B-GGUF",
filename="Qwen_Qwen3.5-0.8B-Q4_K_M-vendor-sampling.gguf",
)
Model Specifications
- Parameters: 0.8B
- Context length: 262,144 tokens (256K)
- License: Apache 2.0
- Base model: Qwen/Qwen3.5-0.8B
- Architecture: qwen35 (vision-capable)
Licensing / Credits
Licensed under Apache 2.0 (unchanged from upstream). All model credit belongs to the Qwen team,
Alibaba Group. GGUF quantizations provided by unsloth.
Run NobodyWho/Qwen_Qwen3.5-0.8B-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models