GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

di-zhang-fdu/Supra-50M-Instruct-GGUF overview

Supra 50M Instruct GGUF GGUF quantizations of SupraLabs/Supra 50M Instruct https://huggingface.co/SupraLabs/Supra 50M Instruct . Files | File | Quantization | …

llama.cppggufquantizedsupratext-generationbase_model:SupraLabs/Supra-50M-Instructbase_model:quantized:SupraLabs/Supra-50M-Instructlicense:apache-2.0endpoints_compatibleregion:us

Runs locally from ~27.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation

Repository Files & Downloads

8 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Q2_K.ggufGGUFQ2_K27.4 MBDownload
Q3_K_S.ggufGGUFQ3_K_S29.6 MBDownload
Q4_0.ggufGGUFQ4_032.9 MBDownload
Q4_K_M.ggufGGUFQ4_K_M35.7 MBDownload
Q4_K_S.ggufGGUFQ4_K_S34.1 MBDownload
Q6_K.ggufGGUFQ6_K43.6 MBDownload
Q8_0.ggufGGUFQ8_053.6 MBDownload
model.ggufGGUFGGUF99.9 MBDownload

Model Details

Model IDdi-zhang-fdu/Supra-50M-Instruct-GGUF
Authordi-zhang-fdu
Pipelinetext-generation
Licenseapache-2.0
Base modelSupraLabs/Supra-50M-Instruct
Last modified2026-06-08T02:03:51.000Z

Model README

---

license: apache-2.0

base_model: SupraLabs/Supra-50M-Instruct

library_name: llama.cpp

tags:

  • gguf
  • llama.cpp
  • quantized
  • supra
  • text-generation

---

Supra-50M-Instruct-GGUF

GGUF quantizations of SupraLabs/Supra-50M-Instruct.

Files

| File | Quantization | Size |

| --- | --- | ---: |

| model.gguf | F16 base GGUF | 100M |

| Q8_0.gguf | Q8_0 | 54M |

| Q6_K.gguf | Q6_K | 44M |

| Q4_K_M.gguf | Q4_K_M | 36M |

| Q4_K_S.gguf | Q4_K_S | 35M |

| Q4_0.gguf | Q4_0 | 33M |

| Q3_K_S.gguf | Q3_K_S | 30M |

| Q2_K.gguf | Q2_K | 28M |

Checksums are in SHA256SUMS.

Conversion

model.gguf is the F16 GGUF from the source repository. The quantized files were generated with llama.cpp build b9550 using llama-quantize.

Some K-quant outputs may contain fallback tensor types where tensor dimensions are not divisible by the required K-quant block size. This is normal llama.cpp behavior for this model shape.

Usage

This model was instruction-tuned with the Alpaca prompt format. It is not a ChatML-style multi-turn chat model, so use completion mode and include the prompt template. The GGUF files intentionally do not include tokenizer.chat_template, because llama.cpp chat mode would otherwise try to apply the wrong template.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
What is the capital of Japan?

### Response:

Example with llama.cpp:

cat > prompt.txt <<'EOF'
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
What is the capital of Japan?

### Response:
EOF

llama-completion \
  -hf di-zhang-fdu/Supra-50M-Instruct-GGUF:Q4_K_M \
  -f prompt.txt \
  -n 128 \
  --temp 0.7 \
  --top-k 50 \
  --top-p 0.9 \
  --repeat-penalty 1.15 \
  -no-cnv

For deterministic checks, use --temp 0 --top-k 1.

Run di-zhang-fdu/Supra-50M-Instruct-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models