What license applies to sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

Model Intelligence Sheet

sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF overview

Q: How do I run sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: text-generation.

Qwen3 4B Instruct 2507 — BitClass MX 1 Mixed Precision, 3.54 bpw A GGUF quantized version of Qwen3 4B Instruct 2507 https://huggingface.co/Qwen/Qwen3 4B Instru…

ggufqwenqwen3quantizedmixed-precisiontext-generationbase_model:Qwen/Qwen3-4B-Instruct-2507base_model:quantized:Qwen/Qwen3-4B-Instruct-2507license:apache-2.0endpoints_compatibleregion:usimatrixconversational

Runs locally from ~1.66 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

text-generation

Author

sh111111111111111

Repository Files & Downloads

1 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf	GGUF	Q3_K_S	1.66 GB	Download

Model Details

Model ID	sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF
Author	sh111111111111111
Pipeline	text-generation
License	apache-2.0
Base model	Qwen/Qwen3-4B-Instruct-2507
Last modified	2026-06-13T03:28:11.000Z

Model README

---

license: apache-2.0

base_model: Qwen/Qwen3-4B-Instruct-2507

tags:

- qwen

- qwen3

- gguf

- quantized

- mixed-precision

pipeline_tag: text-generation

library_name: gguf

---

Qwen3-4B-Instruct-2507 — BitClass MX-1 (Mixed-Precision, 3.54 bpw)

A GGUF-quantized version of Qwen3-4B-Instruct-2507 using BitClass, our learned mixed-precision quantization.

This is the MX-1 (compact) variant at 3.54 bits per weight, optimized for size and throughput. For a higher-quality variant, see MX-2 (4.00 bpw).

Model

| --- | --- | --- | --- | --- | --- |

| Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf | 3.54 | 1.78 GB | 3.337 | 93.3 tok/s | 11.4 tok/s |

Benchmark Results

All models evaluated using lm-evaluation-harness v0.4.11 (0-shot) on identical hardware. Higher is better for all metrics.

| --- | --- | --- | --- | --- | --- | --- | --- |

| Unsloth Q5_K_M | 5.75 | 2.69 GB | 58.79 | 72.55 | 57.49 | 62.49 | 62.83 |

| Unsloth Q4_K_M | 4.97 | 2.33 GB | 57.76 | 66.87 | 55.27 | 60.75 | 60.16 |

| Ours MX-2 | 4.00 | 2.01 GB | 57.42 | 51.40 | 53.60 | 60.21 | 55.66 |

| Ours MX-1 | 3.54 | 1.78 GB | 53.75 | 53.60 | 50.46 | 60.14 | 54.49 |

| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 55.72 | 45.56 | 51.20 | 58.41 | 52.72 |

| Unsloth Q3_K_S | 3.75 | 1.76 GB | 55.89 | 41.24 | 52.13 | 60.10 | 52.34 |

ARC-C: acc_norm, GSM8K: exact_match (flexible-extract), IFEval: prompt_level_strict_acc, TruthfulQA: acc (mc2). All values ×100.

MX-1 at a glance:

+2.15 average over Unsloth Q3_K_S at comparable size (1.78 vs 1.76 GB)
GSM8K 53.60 — massively beats both Q3_K_S (41.24, +12.36) and ByteShape (45.56, +8.04)
TruthfulQA 60.14 — on par with Q4_K_M (60.75) at smaller size (1.78 vs 2.33 GB)
93.3 tok/s GPU throughput — fastest in our tests

Perplexity Comparison

!Precision Loss Chart

| --- | --- | --- | --- | --- |

| Unsloth Q5_K_M | 5.75 | 2.69 GB | 2.907 | unsloth |

| Unsloth Q3_K_S | 3.75 | 1.76 GB | 3.007 | unsloth |

| Unsloth Q4_K_M | 4.97 | 2.33 GB | 2.956 | unsloth |

| ByteShape KQ 3.34 | 3.34 | 1.69 GB | 3.175 | byteshape |

| ByteShape KQ 3.19 | 3.19 | 1.61 GB | 3.192 | byteshape |

| Ours MX-1 | 3.54 | 1.78 GB | 3.337 | This repo |

| ByteShape IQ 3.07 | 3.07 | 1.55 GB | 3.423 | byteshape |

Comparable perplexity to ByteShape's IQ3_S 3.07bpw, but with 2.1x higher CPU throughput (11.4 vs 5.4 tok/s) and dramatically better task scores — especially GSM8K where MX-1 scores 53.60 versus Q3_K_S's 41.24.

Quantization Labels

The filename label Q3_K_S indicates the base quantization type. The actual model uses a mix of quantization types across tensor groups, with an average effective bits per weight of 3.54.

Running with Ollama

ollama run hf.co/sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF:Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf

Running with llama.cpp

# Chat
llama-cli -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf -cnv

# Server (OpenAI-compatible API)
llama-server -m Qwen3-4B-Instruct-2507-Q3_K_S-3.54bpw.gguf --port 8080

Evaluation Details

Perplexity & Throughput: llama.cpp b8514, measured on both NVIDIA GB10 GPU (-ngl 999) and CPU
Task benchmarks: lm-evaluation-harness v0.4.11, 0-shot, via llama-cpp-python with logits_all=True
All models benchmarked in the same session on identical hardware for fair comparison

Disclaimer

Independent project. Not affiliated with or endorsed by Qwen, Unsloth, ByteShape, Bartowski, or llama.cpp. Competitor figures are from our own benchmark harness and may differ from those projects' self-reported numbers; competitor file sizes reflect the revision we tested and may since have changed.

License

Apache 2.0, inherited from Qwen3-4B-Instruct-2507.

Acknowledgments

Base model by Qwen Team
Importance matrix data from Unsloth
Quantization infrastructure built on llama.cpp

Run sh111111111111111/Qwen3-4B-Instruct-2507-BitClass-MX1-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models