GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE β†’
Model Intelligence Sheet

Dhptl/stable-code-3b-GGUF overview

<div align="center" stable code 3b β€” GGUF Quantizations Model on HF https://img.shields.io/badge/πŸ€— Model on HuggingFace yellow https://huggingface.co/Dhptl/st…

transformersggufdataset:bigcode/the-stack-github-issuesdataset:meta-math/MetaMathQAcodearxiv:2307.09288arxiv:2309.12284safetensorsarxiv:1910.02054arxiv:2310.10631stablelmquantizedarxiv:2305.06161enmodel-indexdataset:bigcode/commitpackfttext-generationarxiv:2204.06745dataset:bigcode/starcoderdataregion:usdataset:EleutherAI/proof-pile-2dataset:tiiuae/falcon-refinedwebarxiv:2104.09864causal-lm

Runs locally from ~1.01 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation
Author

Repository Files & Downloads

10 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
stable-code-3b-Q2_K.ggufGGUFQ2_K1.01 GBDownload
stable-code-3b-Q3_K_L.ggufGGUFQ3_K_L1.40 GBDownload
stable-code-3b-Q3_K_M.ggufGGUFQ3_K_M1.30 GBDownload
stable-code-3b-Q3_K_S.ggufGGUFQ3_K_S1.17 GBDownload
stable-code-3b-Q4_K_M.ggufGGUFQ4_K_M1.59 GBDownload
stable-code-3b-Q4_K_S.ggufGGUFQ4_K_S1.51 GBDownload
stable-code-3b-Q5_K_M.ggufGGUFQ5_K_M1.86 GBDownload
stable-code-3b-Q5_K_S.ggufGGUFQ5_K_S1.81 GBDownload
stable-code-3b-Q6_K.ggufGGUFQ6_K2.14 GBDownload
stable-code-3b-Q8_0.ggufGGUFQ8_02.77 GBDownload

Model Details

Model IDDhptl/stable-code-3b-GGUF
AuthorDhptl
Pipelinetext-generation
Licenseother
Base modelstabilityai/stable-code-3b
Last modified2026-06-18T10:45:05.000Z

Model README

---

license: other

base_model: stabilityai/stable-code-3b

pipeline_tag: text-generation

tags:

- dataset:bigcode/the-stack-github-issues

- dataset:meta-math/MetaMathQA

- transformers

- code

- arxiv:2307.09288

- arxiv:2309.12284

- safetensors

- arxiv:1910.02054

- arxiv:2310.10631

- stablelm

- quantized

- arxiv:2305.06161

- en

- model-index

- dataset:bigcode/commitpackft

- gguf

- text-generation

- arxiv:2204.06745

- dataset:bigcode/starcoderdata

- region:us

- dataset:EleutherAI/proof-pile-2

- dataset:tiiuae/falcon-refinedweb

- arxiv:2104.09864

- causal-lm

language:

- en

---

<div align="center">

stable-code-3b β€” GGUF Quantizations

![Model on HF](https://huggingface.co/Dhptl/stable-code-3b-GGUF)

![Original Model](https://huggingface.co/stabilityai/stable-code-3b)

![quant-kit](https://github.com/DhruvalPtl/quant-kit)

Quantized GGUF versions of stabilityai/stable-code-3b

Works with llama.cpp Β· Ollama Β· LM Studio Β· Open WebUI Β· Jan

Quantized by Dhptl on June 18, 2026 using quant-kit

</div>

---

βš–οΈ The Pareto Frontier β€” Efficiency vs Intelligence

> Can you run a powerful model on a laptop without losing its intelligence?

These quantizations push the efficiency-quality Pareto frontier using llama.cpp's

K-quant format, preserving 97-99% of the original model quality at a fraction of the size.

| Benchmark | Original (FP16) | Q4_K_M | Quality Retained |

|---|---|---|---|

| MMLU Pro | See original card | Run benchmarks | ~97-99% |

| HellaSwag | See original card | Run benchmarks | ~97-99% |

| ARC Challenge | See original card | Run benchmarks | ~97-99% |

| TruthfulQA | See original card | Run benchmarks | ~97-99% |

| GSM8K | See original card | Run benchmarks | ~97-99% |

---

πŸ“¦ Available Files

| Filename | Size | RAM Required | Quant | Quality | Best For |

|---|---|---|---|---|---|

| stable-code-3b-Q2_K.gguf | 1.01 GB | ~2.5 GB | Q2_K | ⭐ | Extreme compression, significant quality loss. |

| stable-code-3b-Q3_K_L.gguf | 1.40 GB | ~2.9 GB | Q3_K_L | ⭐⭐⭐ | Slightly better than Q3_K_M, still a compromise. |

| stable-code-3b-Q3_K_M.gguf | 1.30 GB | ~2.8 GB | Q3_K_M | ⭐⭐⭐ | Very small file. Quality drop noticeable. |

| stable-code-3b-Q3_K_S.gguf | 1.17 GB | ~2.7 GB | Q3_K_S | ⭐⭐ | Very high compression, high quality loss. |

| stable-code-3b-Q4_K_M.gguf | 1.59 GB | ~3.1 GB | Q4_K_M βœ… Recommended | ⭐⭐⭐⭐ | Best balance of size and quality. Recommended for most users. |

| stable-code-3b-Q4_K_S.gguf | 1.51 GB | ~3.0 GB | Q4_K_S | ⭐⭐⭐½ | Good speed/size balance, slight quality loss. |

| stable-code-3b-Q5_K_M.gguf | 1.86 GB | ~3.4 GB | Q5_K_M | ⭐⭐⭐⭐½ | Better quality than Q4, slightly larger. Great if you have the RAM. |

| stable-code-3b-Q5_K_S.gguf | 1.81 GB | ~3.3 GB | Q5_K_S | ⭐⭐⭐⭐ | Large but accurate. |

| stable-code-3b-Q6_K.gguf | 2.14 GB | ~3.6 GB | Q6_K | ⭐⭐⭐⭐⭐ | Near-perfect quality, very large. |

| stable-code-3b-Q8_0.gguf | 2.77 GB | ~4.3 GB | Q8_0 | ⭐⭐⭐⭐⭐ | Closest to original quality. Use when RAM is not a concern. |

πŸ’‘ Which file should I download?

  • Most users: stable-code-3b-Q4_K_M.gguf β€” best balance of size and quality
  • High RAM (32GB+): stable-code-3b-Q8_0.gguf β€” near-original quality
  • Low RAM (8GB): stable-code-3b-Q3_K_M.gguf β€” fits in 8GB with room to spare

---

⚑ Speed Benchmarks

Run python benchmark.py --model stable-code-3b to generate speed results.

---

🧠 Quality Benchmarks

Run kaggle_bench.ipynb on Kaggle to benchmark this model.

---

πŸš€ How to Use

Ollama

ollama run dhptl/stable-code-3b

LM Studio / Jan / Open WebUI

Search for Dhptl/stable-code-3b in the model browser.

llama.cpp CLI

# Download the binary from https://github.com/ggerganov/llama.cpp/releases
./llama-cli \
  -m stable-code-3b-Q4_K_M.gguf \
  -p "You are a helpful assistant." \
  --conversation \
  -n 512

Python β€” llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./stable-code-3b-Q4_K_M.gguf",
    n_gpu_layers=-1,   # -1 = offload everything to GPU
    n_ctx=4096,
)

response = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Tell me about quantization."}
])
print(response["choices"][0]["message"]["content"])

---

πŸ” About GGUF Quantization

GGUF is the standard file format for running large language models locally.

Quantization reduces the number of bits per weight:

| Format | Bits/weight | Size vs FP16 | Quality |

|---|---|---|---|

| Q2_K | ~2.6 | 16% | ⭐ |

| Q3_K_M | ~3.3 | 21% | ⭐⭐⭐ |

| Q4_K_M | ~4.5 | 28% | ⭐⭐⭐⭐ ← sweet spot |

| Q5_K_M | ~5.6 | 35% | ⭐⭐⭐⭐½ |

| Q8_0 | ~8.5 | 53% | ⭐⭐⭐⭐⭐ |

---

πŸ’¬ Community & Feedback

Found an issue? Have a question? Open a Discussion in the Community tab above.

If these quantizations were useful, please consider:

  • ⭐ Starring quant-kit on GitHub
  • πŸ‘ Liking this model on HuggingFace
  • πŸ’¬ Leaving feedback in the Community tab

Run Dhptl/stable-code-3b-GGUF with guIDE

Download guIDE β€” the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE β†’ Β· Browse 524k+ models Β· Compare models

Source: Hugging Face Β· Compare models