sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF overview
Qwen3 4B Instruct 2507 — BitClass MX 1 Mixed Precision, 3.54 bpw A GGUF quantized version of Qwen3 4B Instruct 2507 https://huggingface.co/Qwen/Qwen3 4B Instru…
Runs locally from ~1.66 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf | GGUF | Q3_K_S | 1.66 GB | Download |
Model Details
| Model ID | sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF |
|---|---|
| Author | sh111111111111111 |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Last modified | 2026-06-13T03:28:11.000Z |
Model README
---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- qwen
- qwen3
- gguf
- quantized
- mixed-precision
pipeline_tag: text-generation
library_name: gguf
---
Qwen3-4B-Instruct-2507 — BitClass MX-1 (Mixed-Precision, 3.54 bpw)
A GGUF-quantized version of Qwen3-4B-Instruct-2507 using BitClass, our learned mixed-precision quantization.
This is the MX-1 (compact) variant at 3.54 bits per weight, optimized for size and throughput. For a higher-quality variant, see MX-2 (4.00 bpw).
Model
| File | Bits/Weight | Size | Perplexity ↓ | Throughput (GPU) | Throughput (CPU) |
| --- | --- | --- | --- | --- | --- |
| Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf | 3.54 | 1.78 GB | 3.337 | 93.3 tok/s | 11.4 tok/s |
Benchmark Results
All models evaluated using lm-evaluation-harness v0.4.11 (0-shot) on identical hardware. Higher is better for all metrics.
| Model | BPW | Size | ARC-C ↑ | GSM8K ↑ | IFEval ↑ | TruthfulQA ↑ | Avg |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Unsloth Q5_K_M | 5.75 | 2.69 GB | 58.79 | 72.55 | 57.49 | 62.49 | 62.83 |
| Unsloth Q4_K_M | 4.97 | 2.33 GB | 57.76 | 66.87 | 55.27 | 60.75 | 60.16 |
| Ours MX-2 | 4.00 | 2.01 GB | 57.42 | 51.40 | 53.60 | 60.21 | 55.66 |
| Ours MX-1 | 3.54 | 1.78 GB | 53.75 | 53.60 | 50.46 | 60.14 | 54.49 |
| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 55.72 | 45.56 | 51.20 | 58.41 | 52.72 |
| Unsloth Q3_K_S | 3.75 | 1.76 GB | 55.89 | 41.24 | 52.13 | 60.10 | 52.34 |
ARC-C: acc_norm, GSM8K: exact_match (flexible-extract), IFEval: prompt_level_strict_acc, TruthfulQA: acc (mc2). All values ×100.
MX-1 at a glance:
- +2.15 average over Unsloth Q3_K_S at comparable size (1.78 vs 1.76 GB)
- GSM8K 53.60 — massively beats both Q3_K_S (41.24, +12.36) and ByteShape (45.56, +8.04)
- TruthfulQA 60.14 — on par with Q4_K_M (60.75) at smaller size (1.78 vs 2.33 GB)
- 93.3 tok/s GPU throughput — fastest in our tests
Perplexity Comparison
| Model | BPW | Size | PPL ↓ | Source |
| --- | --- | --- | --- | --- |
| Unsloth Q5_K_M | 5.75 | 2.69 GB | 2.907 | unsloth |
| Unsloth Q3_K_S | 3.75 | 1.76 GB | 3.007 | unsloth |
| Unsloth Q4_K_M | 4.97 | 2.33 GB | 2.956 | unsloth |
| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 3.175 | byteshape |
| ByteShape KQ 3.19 | 3.19 | 1.61 GB | 3.192 | byteshape |
| Ours MX-1 | 3.54 | 1.78 GB | 3.337 | This repo |
| ByteShape IQ 3.07 | 3.07 | 1.55 GB | 3.423 | byteshape |
Comparable perplexity to ByteShape's IQ3_S 3.07bpw, but with 2.1x higher CPU throughput (11.4 vs 5.4 tok/s) and dramatically better task scores — especially GSM8K where MX-1 scores 53.60 versus Q3_K_S's 41.24.
Quantization Labels
The filename label Q3_K_S indicates the base quantization type. The actual model uses a mix of quantization types across tensor groups, with an average effective bits per weight of 3.54.
Running with Ollama
ollama run hf.co/sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF:Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf
Running with llama.cpp
# Chat
llama-cli -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf -cnv
# Server (OpenAI-compatible API)
llama-server -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf --port 8080
Evaluation Details
- Perplexity & Throughput: llama.cpp b8514, measured on both NVIDIA GB10 GPU (
-ngl 999) and CPU - Task benchmarks: lm-evaluation-harness v0.4.11, 0-shot, via llama-cpp-python with
logits_all=True - All models benchmarked in the same session on identical hardware for fair comparison
Disclaimer
Independent project. Not affiliated with or endorsed by Qwen, Unsloth, ByteShape, Bartowski, or llama.cpp. Competitor figures are from our own benchmark harness and may differ from those projects' self-reported numbers; competitor file sizes reflect the revision we tested and may since have changed.
License
Apache 2.0, inherited from Qwen3-4B-Instruct-2507.
Acknowledgments
Run sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models