GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

Luminia/MiniCPM5-1B-Agent-GGUF overview

MiniCPM5 1B Agent A tiny agentic coding agent for CPU : a full fine tune large dataset capacity of openbmb/MiniCPM5 1B https://huggingface.co/openbmb/MiniCPM5 …

transformersggufagenticcodetool-useagentminicpmfull-fine-tuneon-cputext-generationendataset:nvidia/Nemotron-SFT-OpenCode-v1dataset:nvidia/Nemotron-SFT-SWE-v2dataset:nvidia/Nemotron-Terminal-Corpusdataset:nvidia/Nemotron-SFT-Competitive-Programming-v2dataset:nvidia/OpenCodeReasoningdataset:nvidia/Nemotron-SFT-Agentic-v2dataset:lambda/hermes-agent-reasoning-tracesdataset:openbmb/UltraData-SFT-2605dataset:nvidia/SWE-Zero-openhands-trajectoriesdataset:nvidia/SWE-Hero-openhands-trajectoriesdataset:ricdomolm/mini-coder-trajs-400kdataset:TeichAI/Hunter-Alpha-Coding-Agent-SFTdataset:TeichAI/DeepSeek-v4-Pro-Agent

Runs locally from ~1.07 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
22
Likes
0
Pipeline
text-generation
Author

Repository Files & Downloads

2 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
MiniCPM5-1B-Agent-v4-Q8_0.ggufGGUFQ8_01.07 GBDownload
MiniCPM5-1B-Agent-v4-f16.ggufGGUFF162.02 GBDownload

Model Details

Model IDLuminia/MiniCPM5-1B-Agent-GGUF
AuthorLuminia
Pipelinetext-generation
Licenseapache-2.0
Base modelopenbmb/MiniCPM5-1B
Last modified2026-06-10T17:27:30.000Z

Model README

---

license: apache-2.0

base_model: openbmb/MiniCPM5-1B

library_name: transformers

pipeline_tag: text-generation

tags:

- agentic

- code

- tool-use

- agent

- minicpm

- gguf

- full-fine-tune

- on-cpu

datasets:

- nvidia/Nemotron-SFT-OpenCode-v1

- nvidia/Nemotron-SFT-SWE-v2

- nvidia/Nemotron-Terminal-Corpus

- nvidia/Nemotron-SFT-Competitive-Programming-v2

- nvidia/OpenCodeReasoning

- nvidia/Nemotron-SFT-Agentic-v2

- lambda/hermes-agent-reasoning-traces

- openbmb/UltraData-SFT-2605

- nvidia/SWE-Zero-openhands-trajectories

- nvidia/SWE-Hero-openhands-trajectories

- ricdomolm/mini-coder-trajs-400k

- TeichAI/Hunter-Alpha-Coding-Agent-SFT

- TeichAI/DeepSeek-v4-Pro-Agent

- TeichAI/MiniMax-M2.1-Code-SFT

- armand0e/minimax-m3-claude-code-traces

- zake7749/deepseek-v4-pro-agent-tool-calling-trajectory

- armand0e/qwen3.7-max-pi-traces

- armand0e/kimi-k2.6-claude-code-traces

- Emperorizzis/ASTRA-SFT-1k

- nlile/misc-merged-claude-code-traces-v1

- WhitzardAgent/ClaudeCode-OpenHands

- peteromallet/my-dataclaw-data

- peteromallet/my-personal-codex-data

- woctordho/dataclaw

- lelouch0110/claudeset-community

- zhiyaowang/dataclaw-zhiyaowang

language:

- en

- code

---

MiniCPM5-1B-Agent

A tiny agentic coding agent for CPU: a full fine-tune (large dataset capacity) of openbmb/MiniCPM5-1B (RL+OPD checkpoint, 4 iteration or ~6d of training) specialized to reason in <think>, call a small tool set (bash/read/write/edit/glob/grep), and run -> read output -> debug -> patch -> verify. Runs the whole loop on a free CPU.

Reproduce

The training scripts are in code/ (see code/README.md). This is the recipe +

code, not a one-command runner: it also needs the 26 source HF datasets (listed below), the abliterated

openbmb/MiniCPM5-1B base, a CUDA PyTorch env (torch cu128 + liger-kernel), and llama.cpp for the GGUF

step. The final v4 data this produces is already bundled at dataset/. Full fine-tunes fit under

~18 GB VRAM. The pipeline:

# 1) BUILD DATA -> train_v4.jsonl (45,762 rows). Keeps the proven v2 backbone WHOLE (42,224 rows) + ~3,538
#    CURATED rows: served-vocab gate, drop non-terminating / explore-only / over-long traces, solution-aware
#    MinHash dedup. Converters: code/data/converters/*.py; canonical render + assistant-span mask: code/data/schema.py
python code/data/build_v4.py

# 2) SFT - full fine-tune the abliterated base on the v4 mix (1 epoch; Liger fused CE + mem-efficient SDPA)
python code/train/sft.py --model <abliterated-base> \
  --train_file dataset/train_v4.jsonl --out outputs/sft_v4 \
  --epochs 1 --bsz 1 --accum 24 --lr 1e-5 --max_len 24576 --train_cap 24576

# 3) BUILD DPO PAIRS - ON-POLICY: run the SFT model over the training prompts, capture its OWN behaviour.
#    chosen = a VALID <function> tool call (the model's own correct format, else the gold call);
#    rejected = its real miss (rambles in <think> / answers in prose with no tool call). ~649 pairs.
python code/data/build_prefs_onpolicy_gpu.py --model outputs/sft_v4 \
  --src dataset/train_v4.jsonl --out dataset/dpo_onpolicy_v4.jsonl

# 4) DPO - full fine-tune (custom completion-only loop; fits 32 GB), reference = the SFT-v4 model
python code/train/dpo.py --model outputs/sft_v4 \
  --data dataset/dpo_onpolicy_v4.jsonl --out outputs/dpo_v4 \
  --beta 0.1 --lr 1e-6 --epochs 3 --accum 8

# 5) GGUF for CPU serving (f16 + Q8_0) - using llama.cpp (github.com/ggerganov/llama.cpp)
python llama.cpp/convert_hf_to_gguf.py outputs/dpo_v4 --outfile dpo_v4-f16.gguf --outtype f16
llama-quantize dpo_v4-f16.gguf dpo_v4-Q8_0.gguf Q8_0

---

<details>

<summary><b>Replicate this training</b></summary>

Non-obvious config behind the numbered Reproduce steps.

Dataset mix

Per-source CONTRIBUTED rows (pre-dedup):

| HF dataset | contributed | role / cluster |

|---|---:|---|

| nvidia/Nemotron-SFT-OpenCode-v1 | 11,995 | backbone, strong Qwen3-Coder teacher |

| nvidia/Nemotron-SFT-SWE-v2 | 6,995 | real-repo SWE patches |

| nvidia/Nemotron-Terminal-Corpus | 5,995 | terminal/bash agent |

| lambda/hermes-agent-reasoning-traces | 4,995 | gold <think> + tool format |

| nvidia/Nemotron-SFT-Competitive-Programming-v2 | 4,995 | reasoning to runnable code |

| ricdomolm/mini-coder-trajs-400k | 4,000 | curated KEEP addition |

| nvidia/OpenCodeReasoning | 3,995 | reasoning to code |

| nlile/misc-merged-claude-code-traces-v1 | 3,954 | census-recovered (real Claude-Code, Anthropic content-blocks) |

| nvidia/SWE-Zero-openhands-trajectories | 3,000 | curated KEEP addition |

| openbmb/UltraData-SFT-2605 | 2,995 | anti-forget anchor |

| TeichAI/DeepSeek-v4-Pro-Agent | 2,284 | pi-harness / Kimi session |

| zake7749/deepseek-v4-pro-agent-tool-calling-trajectory | 1,813 | curated KEEP addition |

| Emperorizzis/ASTRA-SFT-1k | 1,000 | curated KEEP addition |

| TeichAI/MiniMax-M2.1-Code-SFT | 916 | census-recovered (structured tool-use) |

| armand0e/minimax-m3-claude-code-traces | 30 | real MiniMax-M3 Claude-Code agentic traces |

| TeichAI/Hunter-Alpha-Coding-Agent-SFT | 780 | curated KEEP addition |

| woctordho/dataclaw | 465 | real Claude-Code / DataClaw usage |

| peteromallet/my-dataclaw-data | 445 | real Claude-Code / DataClaw usage |

| peteromallet/my-personal-codex-data | 289 | real Claude-Code / DataClaw usage |

| nvidia/SWE-Hero-openhands-trajectories | 264 | curated KEEP addition |

| nvidia/Nemotron-SFT-Agentic-v2 | 259 | agentic tool-use |

| zhiyaowang/dataclaw-zhiyaowang | 158 | real Claude-Code / DataClaw usage |

| WhitzardAgent/ClaudeCode-OpenHands | 118 | real Claude-Code / DataClaw usage |

| lelouch0110/claudeset-community | 69 | real Claude-Code / DataClaw usage |

| armand0e/qwen3.7-max-pi-traces | 24 | pi-harness / Kimi session |

| armand0e/kimi-k2.6-claude-code-traces | 6 | pi-harness / Kimi session |

26 sources, each converted to one canonical schema ({messages, tools} -> MiniCPM ChatML + <think> + XML <function> tool-calls), tool names normalized to the served vocab. The final v4 mix = 45,762 rows = the proven v2 backbone (42,224, kept whole) + ~3,538 curated additions (served-vocab gate + solution-aware dedup; the counts above are pre-dedup CONTRIBUTED). Zero truncation: ~36% of examples are >=12k tokens (~65% of all training tokens). Bundled under dataset/.

SFT (code/train/sft.py)

Memory tricks (full-FT a 1B in under 16 GB):

  • LigerFusedLinearCrossEntropyLoss called directly in compute_loss = saves ~10 GiB (never materializes the [B,L,130560] logits).
  • mem-efficient SDPA forced (math off = avoids O(L^2) OOM at long ctx; flash/cuDNN off); use_gqa_in_sdpa -> False (repeat_kv); bsz=1 + attention_mask=None for the O(L) causal path (so grad-accum, not batching).
  • leak hygiene: empty_cache every 50 steps, garbage_collection_threshold:0.8, pin_memory=False.

Result: full-FT of a 1B at 24,576 ctx fits in ~15-18 GB VRAM.

DPO (code/train/dpo.py)

On-policy preference data (code/data/build_prefs_onpolicy_gpu.py): run the SFT model over the training prompts and capture its OWN behaviour - chosen = a valid <function> tool call (the model's own correct format, else the gold call), rejected = its real miss (rambles in <think> / answers in prose with no tool call). ~649 pairs. This rewards ACTING over stalling. Custom DPO loop (TRL DPOTrainer blocked by a mergekit dep cascade; TRL KTO needs bsz>1 -> OOM at 13k): frozen bf16 reference, prompt span masked (loss on completion only). Extra memory trick over SFT = lm_head applied to only the completion span, so the [L, 130560] logit tensor is never materialized (fits 32 GB).

</details>

<details>

<summary><b>Output examples</b></summary>

Try it live on the demo Space - the agent runs the full write -> run -> verify loop on a free CPU and shows the trajectory + produced files inline:

  • "Write a Python script that makes a bar chart of 30, 45, 25 labeled A, B, C, saves chart.png, then run it." -> writes the script, runs it, the PNG renders inline.
  • "Make a little web page with a button that shows a different random quote each click." -> writes the HTML, renders it live in a sandboxed iframe.
  • "How many $40 video games can I buy in a year if I make $2000/mo and pay rent? Look up this year's average US rent, then work it out." -> web_search -> web_fetch -> compute.

</details>

Credits / inspiration (repos & tools)

opencode and claw-code (open coding-agent frameworks), smallcode (small-LLM agent patterns); DataClaw (agent traces Claude Code); TeichAI (distilled agent-trace datasets + their Datagen tool).

Run Luminia/MiniCPM5-1B-Agent-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models