GraySoft
Projects Models About FAQ Contact Download guIDE →
Model Intelligence Sheet

tobil/qmd-query-expansion-1.7b-gguf overview

Train small language models to expand search queries for QMD's hybrid retrieval pipeline.

ggufquery-expansionsearchqwen3text-generationenarxiv:2212.10496base_model:Qwen/Qwen3-1.7Bbase_model:quantized:Qwen/Qwen3-1.7Blicense:mitendpoints_compatibleregion:usconversational
tobil/qmd-query-expansion-1.7b-gguf visual
Downloads
73,494
Likes
18
Pipeline
text-generation
Library
Visibility
Public
Access
Open

Repository Files & Downloads

4 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
qmd-query-expansion-1.7B-f16.gguf GGUF F16 3.79 GB Download
qmd-query-expansion-1.7B-q4_k_m.gguf GGUF Q4_K_M 1.19 GB Download
qmd-query-expansion-1.7B-q5_k_m.gguf GGUF Q5_K_M 1.37 GB Download
qmd-query-expansion-1.7B-q8_0.gguf GGUF 2.02 GB Download

Model Details Live

Model Slug
tobil/qmd-query-expansion-1.7b-gguf
Author
tobil
Pipeline Task
text-generation
Library
Created
2026-01-25
Last Modified
2026-01-29
Gated
No
Private
No
HF SHA
7816de0b72572c6c860ca1eddf97ba9e7fb8cc65
License
mit
Language
en
Base Model
Qwen/Qwen3-1.7B

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "mit",
    "language": [
      "en"
    ],
    "base_model": "Qwen/Qwen3-1.7B",
    "tags": [
      "query-expansion",
      "search",
      "gguf",
      "qwen3"
    ],
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "license": "mit",
      "language": [
        "en"
      ],
      "base_model": "Qwen/Qwen3-1.7B",
      "tags": [
        "query-expansion",
        "search",
        "gguf",
        "qwen3"
      ],
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "",
    "summary": "Train small language models to expand search queries for QMD's hybrid retrieval pipeline.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: mit\nlanguage:\n  - en\nbase_model: Qwen/Qwen3-1.7B\ntags:\n  - query-expansion\n  - search\n  - gguf\n  - qwen3\npipeline_tag: text-generation\n---\n\n# QMD Query Expansion Fine-Tuning\n\nTrain small language models to expand search queries for [QMD](https://github.com/tobi/qmd)'s hybrid retrieval pipeline.\n\n## What This Does\n\nGiven a raw search query like `\"auth config\"`, the trained model produces structured expansions:\n\n```\nlex: authentication configuration\nlex: auth settings setup\nvec: how to configure authentication settings\nvec: authentication configuration options\nhyde: Authentication can be configured by setting the AUTH_SECRET environment variable.\n```\n\nThese feed into QMD's three search backends:\n- **`lex:`** lines go to BM25 full-text search (short, keyword-focused)\n- **`vec:`** lines go to vector similarity search (natural language phrases)\n- **`hyde:`** is a hypothetical document passage for embedding-based retrieval ([HyDE](https://arxiv.org/abs/2212.10496) technique)\n\n## Quick Start\n\n### Cloud training via HuggingFace Jobs (no GPU needed)\n\n```bash\n# 1. SFT: teach the model the output format (~45 min on A10G, ~$1.50)\nhf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 2h jobs/sft.py\n\n# 2. GRPO: RL refinement on top of SFT (~20 min on A10G, ~$0.50)\nhf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 4h jobs/grpo.py\n\n# 3. Evaluate against test queries (needs local GPU or use eval job)\nuv run eval.py --model tobil/qmd-query-expansion-1.7B-grpo \\\n               --sft-model tobil/qmd-query-expansion-1.7B-sft\n\n# 4. Convert to GGUF for local deployment (Ollama, llama.cpp)\nuv run convert_gguf.py --size 1.7B\n```\n\n### Local training (if you have a GPU)\n\n```bash\nuv run train.py sft  --config configs/sft.yaml\nuv run train.py grpo --config configs/grpo.yaml\n```\n\n### Monitoring HF Jobs\n\n```bash\nhf jobs ps                           # list running jobs\nhf jobs inspect <job-id>             # check status\nhf jobs logs <job-id>                # stream logs\nhf jobs cancel <job-id>              # cancel a job\n```\n\n## Prompt Format\n\nAll tools use the same prompt — **Qwen3 chat template with `/no_think`**:\n\n```\n<|im_start|>user\n/no_think Expand this search query: {query}<|im_end|>\n<|im_start|>assistant\n```\n\nThe `/no_think` directive suppresses Qwen3's chain-of-thought mode, producing\ndirect `lex:/vec:/hyde:` output without `<think>` blocks.\n\n## File Structure\n\n```\nfinetune/\n├── reward.py          # Scoring/reward function (single source of truth)\n├── train.py           # Unified SFT + GRPO training (two subcommands)\n├── eval.py            # Generate expansions and score them\n├── convert_gguf.py    # GGUF conversion for Ollama/llama.cpp\n├── jobs/\n│   ├── sft.py         # Self-contained SFT for HuggingFace Jobs\n│   ├── grpo.py        # Self-contained GRPO for HuggingFace Jobs\n│   ├── eval.py        # Self-contained eval for HuggingFace Jobs\n│   ├── eval_common.py # Shared eval utilities\n│   └── quantize.py    # GGUF quantization for HuggingFace Jobs\n├── configs/\n│   ├── sft.yaml       # SFT hyperparameters for Qwen3-1.7B\n│   └── grpo.yaml      # GRPO hyperparameters for Qwen3-1.7B\n├── evals/\n│   └── queries.txt    # 31 test queries across 8 categories\n├── data/\n│   └── qmd_expansion_v2.jsonl  # Source training data (1,000 high-quality examples)\n├── dataset/\n│   ├── generate_data.py         # Generate data via Claude API\n│   ├── generate_data_offline.py # Generate from existing HF dataset\n│   ├── prepare_data.py          # Format for Qwen3 chat template\n│   └── clean_data.py            # Detect technical term misinterpretations\n├── SCORING.md         # Detailed scoring rubric reference\n└── README.md          # This file\n```\n\n## Training Pipeline\n\n### Stage 1: SFT (Supervised Fine-Tuning)\n\nTeaches the model the `lex:/vec:/hyde:` output format from labeled examples.\n\n| Parameter | Value |\n|-----------|-------|\n| Base model | `Qwen/Qwen3-1.7B` |\n| Method | LoRA (rank 16, alpha 32) |\n| Target modules | All projection layers (q/k/v/o/gate/up/down) |\n| Dataset | ~2,290 examples (train split) |\n| Effective batch size | 16 (4 × 4 gradient accumulation) |\n| Epochs | 5 |\n| Learning rate | 2e-4 (cosine schedule) |\n\n```bash\nuv run train.py sft --config configs/sft.yaml\nuv run train.py sft --config configs/sft.yaml --dry-run  # preview config\n```\n\n### Stage 2: GRPO (Group Relative Policy Optimization)\n\nReinforcement learning on top of the merged SFT weights. The model generates\nmultiple expansions per query, they are scored by the reward function, and the\nmodel is updated to prefer higher-scoring outputs.\n\n| Parameter | Value |\n|-----------|-------|\n| Base | Merged SFT checkpoint |\n| Method | LoRA (rank 4, alpha 8) — smaller for RL stability |\n| Target modules | q_proj, v_proj only |\n| Reward | `reward.py` (rule-based, 5 dimensions) |\n| KL beta | 0.04 — prevents drift from SFT checkpoint |\n| Generations per prompt | 4 |\n| Max steps | 200 |\n| Learning rate | 5e-7 |\n\n**Important:** `beta > 0` is critical. With `beta=0` the model experiences\ncatastrophic drift and scores drop to 0%.\n\n```bash\nuv run train.py grpo --config configs/grpo.yaml\nuv run train.py grpo --config configs/grpo.yaml --dry-run  # test reward function\n```\n\n## Evaluation\n\n`eval.py` generates expansions from a model and scores them against test queries:\n\n```bash\n# Evaluate an SFT model\nuv run eval.py --model tobil/qmd-query-expansion-1.7B-sft\n\n# Evaluate a GRPO model (needs SFT adapter merged first)\nuv run eval.py --model tobil/qmd-query-expansion-1.7B-grpo \\\n               --sft-model tobil/qmd-query-expansion-1.7B-sft\n\n# Verbose output with deduction details\nuv run eval.py --model tobil/qmd-query-expansion-1.7B-sft -v\n\n# Save detailed scores to JSON\nuv run eval.py --model tobil/qmd-query-expansion-1.7B-sft -o scores.json\n\n# Score an existing JSONL file (backwards compat with old run.py output)\nuv run eval.py --score-only evals/results_old.jsonl\n```\n\n## Reward Function\n\n`reward.py` is the single source of truth for scoring. It is used both as the\nGRPO reward signal during training and for evaluation.\n\nFive scoring dimensions (max 120 without hyde, 140 with):\n\n| Dimension | Points | What It Measures |\n|-----------|--------|------------------|\n| **Format** | 0-30 | Has lex/vec lines, no invalid lines |\n| **Diversity** | 0-30 | Multiple expansion types, diverse content, no query echoes |\n| **HyDE** | 0-20 | Present, 50-200 chars, single line, not repetitive |\n| **Quality** | 0-20 | Lex shorter than vec, natural language, preserves key terms |\n| **Entity** | -45 to +20 | Named entities preserved in lex and vec lines |\n| **Think bonus** | 0-20 | Reward for NOT using `<think>` mode |\n\n**Hard failures** (instant 0.0):\n- Chat template leakage (`<|im_start|>`, `<|im_end|>`, etc.)\n- Any line without a valid `lex:`, `vec:`, or `hyde:` prefix\n\n```bash\n# Self-test the reward function\nuv run reward.py\n```\n\n## GGUF Conversion\n\nMerges base + SFT + GRPO adapters into a single model and produces\nquantized GGUF files for deployment:\n\n```bash\n# Use preset for 1.7B\nuv run convert_gguf.py --size 1.7B\n\n# Use preset for 4B\nuv run convert_gguf.py --size 4B\n\n# Custom models\nuv run convert_gguf.py --base Qwen/Qwen3-1.7B \\\n                       --sft tobil/qmd-query-expansion-1.7B-sft \\\n                       --grpo tobil/qmd-query-expansion-1.7B-grpo \\\n                       --output tobil/qmd-query-expansion-1.7B-gguf\n```\n\n### Using with Ollama\n\n```bash\nhuggingface-cli download tobil/qmd-query-expansion-1.7B-gguf \\\n    qmd-query-expansion-1.7B-q4_k_m.gguf --local-dir .\n\necho 'FROM ./qmd-query-expansion-1.7B-q4_k_m.gguf' > Modelfile\nollama create qmd-expand -f Modelfile\nollama run qmd-expand\n```\n\n## Data Pipeline\n\nThe training data (1,000 examples in `data/qmd_expansion_v2.jsonl`) was generated\nfrom two sources and cleaned for quality. To regenerate:\n\n```bash\n# Generate from existing HuggingFace dataset (bulk, no API needed)\nuv run dataset/generate_data_offline.py\n\n# Generate via Claude API (higher quality, needs ANTHROPIC_API_KEY)\nuv run dataset/generate_data.py --count 100\n\n# Detect and fix technical term misinterpretations\nuv run dataset/clean_data.py\n\n# Format for Qwen3 chat template, add short-query augmentation, split train/val\nuv run dataset/prepare_data.py\n```\n\n## Architecture Notes\n\nThe two-stage training approach (SFT → GRPO) is standard for structured-output models:\n\n1. **SFT** establishes format compliance and basic query understanding. It uses\n   a large LoRA (rank 16, all projection layers) because it needs to learn a\n   new output format from scratch.\n\n2. **GRPO** refines quality within the learned format. It uses a small LoRA\n   (rank 4, q/v only) and KL regularization to make incremental improvements\n   without losing what SFT taught.\n\nThe reward function is entirely rule-based (no LLM judge) which makes it fast,\ndeterministic, and suitable as an RL signal. See `SCORING.md` for the full rubric.\n\n## Training Results (Qwen3-1.7B, v2)\n\n### SFT\n\n| Metric | Value |\n|--------|-------|\n| Final train loss | 0.472 |\n| Final eval loss | 0.304 |\n| Token accuracy (train) | 97.4% |\n| Token accuracy (eval) | 93.8% |\n| Epochs | 5 |\n| Hardware | A10G (24 GB VRAM) |\n\n### GRPO\n\n| Metric | Value |\n|--------|-------|\n| Mean reward | 0.757 |\n| Final loss | 0.0005 |\n| KL divergence | 0.00048 |\n| Mean completion length | ~58 tokens |\n| Training time | ~19 min (200 steps) |\n| Hardware | A10G (24 GB VRAM) |\n\n### Evaluation Scores\n\n| Model | Average Score | Excellent (30) |\n|-------|--------------|-----------------|\n| SFT | 92.0% | 30/30 |\n| GRPO | 91.7% | 30/30 |\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "query-expansion",
    "search",
    "qwen3",
    "text-generation",
    "en",
    "arxiv:2212.10496",
    "base_model:Qwen/Qwen3-1.7B",
    "base_model:quantized:Qwen/Qwen3-1.7B",
    "license:mit",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 18,
  "downloads": 73494,
  "gated": false,
  "private": false,
  "last_modified": "2026-01-29T07:31:10.000Z",
  "created_at": "2026-01-25T16:17:58.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "697642369ed904761a0a158b",
  "id": "tobil/qmd-query-expansion-1.7B-gguf",
  "modelId": "tobil/qmd-query-expansion-1.7B-gguf",
  "sha": "7816de0b72572c6c860ca1eddf97ba9e7fb8cc65",
  "createdAt": "2026-01-25T16:17:58.000Z",
  "lastModified": "2026-01-29T07:31:10.000Z",
  "author": "tobil",
  "downloads": 73494,
  "likes": 18,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 6
}