GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF overview

Qwen3 4B Instruct 2507 — BitClass MX 1 Mixed Precision, 3.54 bpw A GGUF quantized version of Qwen3 4B Instruct 2507 https://huggingface.co/Qwen/Qwen3 4B Instru…

ggufqwenqwen3quantizedmixed-precisiontext-generationbase_model:Qwen/Qwen3-4B-Instruct-2507base_model:quantized:Qwen/Qwen3-4B-Instruct-2507license:apache-2.0endpoints_compatibleregion:usimatrixconversational

Runs locally from ~1.66 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
11
Likes
0
Pipeline
text-generation

Repository Files & Downloads

1 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.ggufGGUFQ3_K_S1.66 GBDownload

Model Details

Model IDsh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF
Authorsh111111111111111
Pipelinetext-generation
Licenseapache-2.0
Base modelQwen/Qwen3-4B-Instruct-2507
Last modified2026-06-13T03:28:11.000Z

Model README

---

license: apache-2.0

base_model: Qwen/Qwen3-4B-Instruct-2507

tags:

- qwen

- qwen3

- gguf

- quantized

- mixed-precision

pipeline_tag: text-generation

library_name: gguf

---

Qwen3-4B-Instruct-2507 — BitClass MX-1 (Mixed-Precision, 3.54 bpw)

A GGUF-quantized version of Qwen3-4B-Instruct-2507 using BitClass, our learned mixed-precision quantization.

This is the MX-1 (compact) variant at 3.54 bits per weight, optimized for size and throughput. For a higher-quality variant, see MX-2 (4.00 bpw).

Model

| File | Bits/Weight | Size | Perplexity ↓ | Throughput (GPU) | Throughput (CPU) |

| --- | --- | --- | --- | --- | --- |

| Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf | 3.54 | 1.78 GB | 3.337 | 93.3 tok/s | 11.4 tok/s |

Benchmark Results

All models evaluated using lm-evaluation-harness v0.4.11 (0-shot) on identical hardware. Higher is better for all metrics.

| Model | BPW | Size | ARC-C ↑ | GSM8K ↑ | IFEval ↑ | TruthfulQA ↑ | Avg |

| --- | --- | --- | --- | --- | --- | --- | --- |

| Unsloth Q5_K_M | 5.75 | 2.69 GB | 58.79 | 72.55 | 57.49 | 62.49 | 62.83 |

| Unsloth Q4_K_M | 4.97 | 2.33 GB | 57.76 | 66.87 | 55.27 | 60.75 | 60.16 |

| Ours MX-2 | 4.00 | 2.01 GB | 57.42 | 51.40 | 53.60 | 60.21 | 55.66 |

| Ours MX-1 | 3.54 | 1.78 GB | 53.75 | 53.60 | 50.46 | 60.14 | 54.49 |

| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 55.72 | 45.56 | 51.20 | 58.41 | 52.72 |

| Unsloth Q3_K_S | 3.75 | 1.76 GB | 55.89 | 41.24 | 52.13 | 60.10 | 52.34 |

ARC-C: acc_norm, GSM8K: exact_match (flexible-extract), IFEval: prompt_level_strict_acc, TruthfulQA: acc (mc2). All values ×100.

MX-1 at a glance:

  • +2.15 average over Unsloth Q3_K_S at comparable size (1.78 vs 1.76 GB)
  • GSM8K 53.60 — massively beats both Q3_K_S (41.24, +12.36) and ByteShape (45.56, +8.04)
  • TruthfulQA 60.14 — on par with Q4_K_M (60.75) at smaller size (1.78 vs 2.33 GB)
  • 93.3 tok/s GPU throughput — fastest in our tests

Perplexity Comparison

!Precision Loss Chart

| Model | BPW | Size | PPL ↓ | Source |

| --- | --- | --- | --- | --- |

| Unsloth Q5_K_M | 5.75 | 2.69 GB | 2.907 | unsloth |

| Unsloth Q3_K_S | 3.75 | 1.76 GB | 3.007 | unsloth |

| Unsloth Q4_K_M | 4.97 | 2.33 GB | 2.956 | unsloth |

| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 3.175 | byteshape |

| ByteShape KQ 3.19 | 3.19 | 1.61 GB | 3.192 | byteshape |

| Ours MX-1 | 3.54 | 1.78 GB | 3.337 | This repo |

| ByteShape IQ 3.07 | 3.07 | 1.55 GB | 3.423 | byteshape |

Comparable perplexity to ByteShape's IQ3_S 3.07bpw, but with 2.1x higher CPU throughput (11.4 vs 5.4 tok/s) and dramatically better task scores — especially GSM8K where MX-1 scores 53.60 versus Q3_K_S's 41.24.

Quantization Labels

The filename label Q3_K_S indicates the base quantization type. The actual model uses a mix of quantization types across tensor groups, with an average effective bits per weight of 3.54.

Running with Ollama

ollama run hf.co/sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF:Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf

Running with llama.cpp

# Chat
llama-cli -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf -cnv

# Server (OpenAI-compatible API)
llama-server -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf --port 8080

Evaluation Details

  • Perplexity & Throughput: llama.cpp b8514, measured on both NVIDIA GB10 GPU (-ngl 999) and CPU
  • Task benchmarks: lm-evaluation-harness v0.4.11, 0-shot, via llama-cpp-python with logits_all=True
  • All models benchmarked in the same session on identical hardware for fair comparison

Disclaimer

Independent project. Not affiliated with or endorsed by Qwen, Unsloth, ByteShape, Bartowski, or llama.cpp. Competitor figures are from our own benchmark harness and may differ from those projects' self-reported numbers; competitor file sizes reflect the revision we tested and may since have changed.

License

Apache 2.0, inherited from Qwen3-4B-Instruct-2507.

Acknowledgments

Run sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models