GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

build-small-hackathon/proofkit-distilled-qwen0.5b-gguf overview

license: apache 2.0 base model: visproj/proofkit distilled qwen0.5b library name: llama.cpp pipeline tag: text generation language: en tags: proofkit gguf llam…

llama.cppggufproofkitdistilledbuild-small-hackathonwork-sampletext-generationenbase_model:visproj/proofkit-distilled-qwen0.5bbase_model:quantized:visproj/proofkit-distilled-qwen0.5blicense:apache-2.0endpoints_compatibleregion:usconversational

Runs locally from ~379.4 MB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation

Repository Files & Downloads

1 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
proofkit-distilled-qwen0.5b-q4_k_m.ggufGGUFQ4_K_M379.4 MBDownload

Model Details

Model IDbuild-small-hackathon/proofkit-distilled-qwen0.5b-gguf
Authorbuild-small-hackathon
Pipelinetext-generation
Licenseapache-2.0
Base modelvisproj/proofkit-distilled-qwen0.5b
Last modified2026-06-12T23:11:36.000Z

Model README

---

license: apache-2.0

base_model: visproj/proofkit-distilled-qwen0.5b

library_name: llama.cpp

pipeline_tag: text-generation

language:

  • en

tags:

  • proofkit
  • gguf
  • llama.cpp
  • distilled
  • build-small-hackathon
  • work-sample

---

ProofKit Qwen 0.5B — distilled (GGUF)

The llama.cpp / GGUF build of

visproj/proofkit-distilled-qwen0.5b

— a Qwen 0.5B student distilled from ProofKit's fine-tuned gpt-oss-20b teacher. This is

the default model the ProofKit Space serves: it runs free on CPU via

llama.cpp, so the app works on a free Space with no GPU.

  • Quantization: q4_k_m (~400 MB)
  • Runtime: llama-cpp-python / llama.cpp
  • Chat template: Qwen2 (embedded in the GGUF metadata)

Usage

from llama_cpp import Llama
llm = Llama.from_pretrained(
    repo_id="visproj/proofkit-distilled-qwen0.5b-gguf",
    filename="*q4_k_m.gguf",
    n_ctx=4096,
)
resp = llm.create_chat_completion(
    messages=[{"role": "system", "content": SYSTEM}, {"role": "user", "content": PROMPT}],
    temperature=0.0,
)
print(resp["choices"][0]["message"]["content"])

Configure it in ProofKit with:

export PROOFKIT_DISTILLED_MODELS='ProofKit Qwen 0.5B Distilled=visproj/proofkit-distilled-qwen0.5b-gguf|*q4_k_m.gguf'

Evaluation (post-fix, 3-judge panel)

Mean score (0–100) on 15 held-out prompts, graded by Claude Opus 4.7, GPT-5.5, and a

local Qwen-3B (gpt-oss experts is a deliberately un-retrained stale control):

| model | Claude | GPT-5.5 | Qwen-3B | Avg |

|---|---:|---:|---:|---:|

| gpt-5.5 (frontier ceiling) | 94.6 | 95.6 | 90.8 | 93.7 |

| gpt-oss attn (retrained teacher) | 82.0 | 66.8 | 81.4 | 76.7 |

| qwen-0.5b distilled (served) | 79.0 | 68.6 | 82.2 | 76.6 |

| qwen-0.5b direct 7k (served) | 78.6 | 64.4 | 82.0 | 75.0 |

| gpt-oss experts (stale control) | 67.6 | 68.6 | 81.8 | 72.7 |

| qwen-3b base | 62.1 | 67.1 | 80.5 | 69.9 |

| gpt-oss base | 55.4 | 53.8 | 68.2 | 59.1 |

| qwen-0.5b base | 36.5 | 44.5 | 67.9 | 49.7 |

Both served retrained 0.5Bs beat the stale control and every untuned base across all

three judges, and the distilled 0.5B ≈ ties its own 20B teacher.

About ProofKit

ProofKit is a work-sample generator for job seekers — it turns a target

role, background, and skills-to-prove into a realistic, clearly-fictional

practice work sample (a role-specific challenge, a guided builder, a readiness

review, and a recruiter-ready portfolio packet). Built for the Hugging Face **Build

Small Hackathon** (Backyard AI track). Integrity rules are load-bearing: outputs

never claim real employment, metrics are labeled hypothetical, and exports carry an

ethical disclosure.

The ProofKit model family

| Repo | What it is |

|---|---|

| visproj/proofkit-qwen0.5b-7k | Qwen2.5-0.5B fine-tuned directly on the 7k set (Transformers) |

| visproj/proofkit-gpt-oss-20b-lora | gpt-oss-20b LoRA — the distillation teacher |

| visproj/proofkit-distilled-qwen0.5b | Qwen2.5-0.5B distilled from the teacher (merged) |

| visproj/proofkit-distilled-qwen0.5b-gguf | GGUF of the distilled student (llama.cpp — served) |

| visproj/proofkit-sft | SFT dataset (synthetic, license-safe) |

| visproj/proofkit-distill-qwen0.5b | Distillation dataset (teacher completions) |

A note on training data (the "static responses" fix)

An earlier version of these models produced repetitive, input-ignoring drafts. The

root cause was synthetic-data leakage: the dataset rendered the example *user

answers and the target* from the same template slots, so the model learned

target = template instead of target = f(input). The fix — faithfulness anchors

(a distinctive token shared by the answer and the target) + **seeded per-example

variation** across every task, then a full-chain retrain — is what these current

weights reflect.

Prompt format is a frozen contract

These 0.5B models were trained on the exact prompt shapes from ProofKit's

prompt_formats.py. They only behave well when prompted in that format; reworded or

free-form prompts push them off-distribution. They are purpose-built components of the

ProofKit app, not general chat models.

Run build-small-hackathon/proofkit-distilled-qwen0.5b-gguf with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models