di-zhang-fdu/Supra-50M-Instruct-GGUF overview
Supra 50M Instruct GGUF GGUF quantizations of SupraLabs/Supra 50M Instruct https://huggingface.co/SupraLabs/Supra 50M Instruct . Files | File | Quantization | …
Runs locally from ~27.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Q2_K.gguf | GGUF | Q2_K | 27.4 MB | Download |
| Q3_K_S.gguf | GGUF | Q3_K_S | 29.6 MB | Download |
| Q4_0.gguf | GGUF | Q4_0 | 32.9 MB | Download |
| Q4_K_M.gguf | GGUF | Q4_K_M | 35.7 MB | Download |
| Q4_K_S.gguf | GGUF | Q4_K_S | 34.1 MB | Download |
| Q6_K.gguf | GGUF | Q6_K | 43.6 MB | Download |
| Q8_0.gguf | GGUF | Q8_0 | 53.6 MB | Download |
| model.gguf | GGUF | GGUF | 99.9 MB | Download |
Model Details
| Model ID | di-zhang-fdu/Supra-50M-Instruct-GGUF |
|---|---|
| Author | di-zhang-fdu |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | SupraLabs/Supra-50M-Instruct |
| Last modified | 2026-06-08T02:03:51.000Z |
Model README
---
license: apache-2.0
base_model: SupraLabs/Supra-50M-Instruct
library_name: llama.cpp
tags:
- gguf
- llama.cpp
- quantized
- supra
- text-generation
---
Supra-50M-Instruct-GGUF
GGUF quantizations of SupraLabs/Supra-50M-Instruct.
Files
| File | Quantization | Size |
| --- | --- | ---: |
| model.gguf | F16 base GGUF | 100M |
| Q8_0.gguf | Q8_0 | 54M |
| Q6_K.gguf | Q6_K | 44M |
| Q4_K_M.gguf | Q4_K_M | 36M |
| Q4_K_S.gguf | Q4_K_S | 35M |
| Q4_0.gguf | Q4_0 | 33M |
| Q3_K_S.gguf | Q3_K_S | 30M |
| Q2_K.gguf | Q2_K | 28M |
Checksums are in SHA256SUMS.
Conversion
model.gguf is the F16 GGUF from the source repository. The quantized files were generated with llama.cpp build b9550 using llama-quantize.
Some K-quant outputs may contain fallback tensor types where tensor dimensions are not divisible by the required K-quant block size. This is normal llama.cpp behavior for this model shape.
Usage
This model was instruction-tuned with the Alpaca prompt format. It is not a ChatML-style multi-turn chat model, so use completion mode and include the prompt template. The GGUF files intentionally do not include tokenizer.chat_template, because llama.cpp chat mode would otherwise try to apply the wrong template.
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What is the capital of Japan?
### Response:
Example with llama.cpp:
cat > prompt.txt <<'EOF'
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What is the capital of Japan?
### Response:
EOF
llama-completion \
-hf di-zhang-fdu/Supra-50M-Instruct-GGUF:Q4_K_M \
-f prompt.txt \
-n 128 \
--temp 0.7 \
--top-k 50 \
--top-p 0.9 \
--repeat-penalty 1.15 \
-no-cnv
For deterministic checks, use --temp 0 --top-k 1.
Run di-zhang-fdu/Supra-50M-Instruct-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models