GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

DhruvalLabs/Kimi-Linear-48B-A3B-Instruct-GGUF overview

<div align="center" Kimi Linear 48B A3B Instruct — GGUF Quantizations Model on HF https://img.shields.io/badge/🤗 Model on HuggingFace yellow https://huggingfa…

transformersgguftext-generationconversationalregion:usarxiv:2412.06464custom_codelicense:mitsafetensorsquantizedkimi_lineararxiv:2510.26692enbase_model:moonshotai/Kimi-Linear-48B-A3B-Instructbase_model:quantized:moonshotai/Kimi-Linear-48B-A3B-Instructendpoints_compatible

Runs locally from ~16.79 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
77
Likes
2
Pipeline
text-generation

Repository Files & Downloads

10 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Kimi-Linear-48B-A3B-Instruct-Q2_K.ggufGGUFQ2_K16.79 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q3_K_L.ggufGGUFQ3_K_L23.76 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q3_K_M.ggufGGUFQ3_K_M21.87 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q3_K_S.ggufGGUFQ3_K_S19.86 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q4_K_M.ggufGGUFQ4_K_M27.66 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q4_K_S.ggufGGUFQ4_K_S26.04 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q5_K_M.ggufGGUFQ5_K_M32.47 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q5_K_S.ggufGGUFQ5_K_S31.56 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q6_K.ggufGGUFQ6_K37.59 GBDownload
Kimi-Linear-48B-A3B-Instruct-Q8_0.ggufGGUFQ8_048.66 GBDownload

Model Details

Model IDDhruvalLabs/Kimi-Linear-48B-A3B-Instruct-GGUF
AuthorDhruvalLabs
Pipelinetext-generation
Licensemit
Base modelmoonshotai/Kimi-Linear-48B-A3B-Instruct
Last modified2026-06-25T10:09:20.000Z

Model README

---

license: mit

base_model: moonshotai/Kimi-Linear-48B-A3B-Instruct

pipeline_tag: text-generation

tags:

- text-generation

- conversational

- gguf

- region:us

- arxiv:2412.06464

- transformers

- custom_code

- license:mit

- safetensors

- quantized

- kimi_linear

- arxiv:2510.26692

language:

- en

---

<div align="center">

Kimi-Linear-48B-A3B-Instruct — GGUF Quantizations

![Model on HF](https://huggingface.co/Dhptl/Kimi-Linear-48B-A3B-Instruct-GGUF)

![Original Model](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)

![quant-kit](https://github.com/DhruvalPtl/quant-kit)

Quantized GGUF versions of moonshotai/Kimi-Linear-48B-A3B-Instruct

Works with llama.cpp · Ollama · LM Studio · Open WebUI · Jan

Quantized by Dhptl on June 25, 2026 using quant-kit

</div>

---

⚖️ The Pareto Frontier — Efficiency vs Intelligence

> Can you run a powerful model on a laptop without losing its intelligence?

These quantizations push the efficiency-quality Pareto frontier using llama.cpp's

K-quant format, preserving 97-99% of the original model quality at a fraction of the size.

| Benchmark | Original (FP16) | Q4_K_M | Quality Retained |

|---|---|---|---|

| MMLU Pro | See original card | Run benchmarks | ~97-99% |

| HellaSwag | See original card | Run benchmarks | ~97-99% |

| ARC Challenge | See original card | Run benchmarks | ~97-99% |

| TruthfulQA | See original card | Run benchmarks | ~97-99% |

| GSM8K | See original card | Run benchmarks | ~97-99% |

---

📦 Available Files

| Filename | Size | RAM Required | Quant | Quality | Best For |

|---|---|---|---|---|---|

| Kimi-Linear-48B-A3B-Instruct-Q4_K_M.gguf | 27.66 GB | ~29.2 GB | Q4_K_MRecommended | ⭐⭐⭐⭐ | Best balance of size and quality. Recommended for most users. |

| Kimi-Linear-48B-A3B-Instruct-Q5_K_M.gguf | 32.47 GB | ~34.0 GB | Q5_K_M | ⭐⭐⭐⭐½ | Better quality than Q4, slightly larger. Great if you have the RAM. |

| Kimi-Linear-48B-A3B-Instruct-Q5_K_S.gguf | 31.56 GB | ~33.1 GB | Q5_K_S | ⭐⭐⭐⭐ | Large but accurate. |

| Kimi-Linear-48B-A3B-Instruct-Q6_K.gguf | 37.59 GB | ~39.1 GB | Q6_K | ⭐⭐⭐⭐⭐ | Near-perfect quality, very large. |

| Kimi-Linear-48B-A3B-Instruct-Q8_0.gguf | 48.66 GB | ~50.2 GB | Q8_0 | ⭐⭐⭐⭐⭐ | Closest to original quality. Use when RAM is not a concern. |

💡 Which file should I download?

  • Most users: Kimi-Linear-48B-A3B-Instruct-Q4_K_M.gguf — best balance of size and quality
  • High RAM (32GB+): Kimi-Linear-48B-A3B-Instruct-Q8_0.gguf — near-original quality
  • Low RAM (8GB): Kimi-Linear-48B-A3B-Instruct-Q3_K_M.gguf — fits in 8GB with room to spare

---

⚡ Speed Benchmarks

Run python benchmark.py --model Kimi-Linear-48B-A3B-Instruct to generate speed results.

---

🧠 Quality Benchmarks

Run kaggle_bench.ipynb on Kaggle to benchmark this model.

---

🚀 How to Use

Ollama

ollama run dhptl/kimi-linear-48b-a3b-instruct

LM Studio / Jan / Open WebUI

Search for Dhptl/Kimi-Linear-48B-A3B-Instruct in the model browser.

llama.cpp CLI

# Download the binary from https://github.com/ggerganov/llama.cpp/releases
./llama-cli \
  -m Kimi-Linear-48B-A3B-Instruct-Q4_K_M.gguf \
  -p "You are a helpful assistant." \
  --conversation \
  -n 512

Python — llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="./Kimi-Linear-48B-A3B-Instruct-Q4_K_M.gguf",
    n_gpu_layers=-1,   # -1 = offload everything to GPU
    n_ctx=4096,
)

response = llm.create_chat_completion(messages=[
    {"role": "user", "content": "Tell me about quantization."}
])
print(response["choices"][0]["message"]["content"])

---

🔍 About GGUF Quantization

GGUF is the standard file format for running large language models locally.

Quantization reduces the number of bits per weight:

| Format | Bits/weight | Size vs FP16 | Quality |

|---|---|---|---|

| Q2_K | ~2.6 | 16% | ⭐ |

| Q3_K_M | ~3.3 | 21% | ⭐⭐⭐ |

| Q4_K_M | ~4.5 | 28% | ⭐⭐⭐⭐ ← sweet spot |

| Q5_K_M | ~5.6 | 35% | ⭐⭐⭐⭐½ |

| Q8_0 | ~8.5 | 53% | ⭐⭐⭐⭐⭐ |

---

💬 Community & Feedback

Found an issue? Have a question? Open a Discussion in the Community tab above.

If these quantizations were useful, please consider:

  • ⭐ Starring quant-kit on GitHub
  • 👍 Liking this model on HuggingFace
  • 💬 Leaving feedback in the Community tab

Run DhruvalLabs/Kimi-Linear-48B-A3B-Instruct-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models