GraySoft
Projects Models About FAQ Contact Download guIDE →

leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-gguf Q2_K GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-gguf overview

This repository contains multiple quantized versions of the gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible.

gguftext-generationlicense:apache-2.0endpoints_compatibleregion:usconversational
leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-gguf visual
Downloads
150
Likes
2
Pipeline
text-generation
Library
Visibility
Public
Access
Open

Repository Files & Downloads

14 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q2_k.gguf GGUF Q2_K 2.45 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q3_k_l.gguf GGUF Q3_K_L 2.66 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q3_k_m.gguf GGUF Q3_K_M 2.58 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q3_k_s.gguf GGUF Q3_K_S 2.44 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q4_0.gguf GGUF 2.48 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q4_1.gguf GGUF 2.69 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q4_k_m.gguf GGUF Q4_K_M 3.01 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q4_k_s.gguf GGUF Q4_K_S 2.87 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q5_0.gguf GGUF 2.90 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q5_1.gguf GGUF 3.11 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q5_k_m.gguf GGUF Q5_K_M 3.21 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q5_k_s.gguf GGUF Q5_K_S 3.09 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q6_k.gguf GGUF Q6_K 4.09 GB Download
AmanPriyanshu_gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q8_0.gguf GGUF 4.16 GB Download

Model Details Live

Model Slug
leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-gguf
Author
leeminwaan
Pipeline Task
text-generation
Library
Created
2025-09-01
Last Modified
2025-09-01
Gated
No
Private
No
HF SHA
16687acc51dd81660ee2d2ae96ddf15424f9dc0b
License
apache-2.0
Language
Unknown
Base Model
gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "base_model": "gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts",
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "license": "apache-2.0",
      "base_model": "gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts",
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "",
    "summary": "This repository contains multiple quantized versions of the gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts model in GGUF format. It is intended for efficient inference on consumer hardware, making large model deployment more accessible.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nbase_model: gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts\npipeline_tag: text-generation\n---\n\n# Model Card for gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF\n\nThis repository contains multiple quantized versions of the gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts model in GGUF format.  \nIt is intended for efficient inference on consumer hardware, making large model deployment more accessible.  \n\n## Model Details\n\n### Model Description\n\n- **Developed by:** leeminwaan  \n- **Funded by [optional]:** Independent project  \n- **Shared by [optional]:** leeminwaan  \n- **Model type:** Decoder-only transformer language model  \n- **Language(s) (NLP):** English (primary), multilingual capabilities not benchmarked  \n- **License:** Apache-2.0  \n\n### Model Sources\n\n- **Repository:** [Hugging Face Repo](https://huggingface.co/leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF)  \n- **Paper [optional]:** Not available  \n- **Demo [optional]:** To be released  \n\n## How to Get Started with the Model\n\n```python\nfrom huggingface_hub import hf_hub_download\n\nmodel_path = hf_hub_download(\"leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF\", \"gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-q4_k_m.gguf\")\nprint(\"Downloaded:\", model_path)\n````\n\nQuantized versions available:\n\n* Q2\\_K, Q3\\_K\\_S, Q3\\_K\\_M, Q3\\_K\\_L\n* Q4\\_0, Q4\\_1, Q4\\_K\\_S, Q4\\_K\\_M\n* Q5\\_0, Q5\\_1, Q5\\_K\\_S, Q5\\_K\\_M\n* Q6\\_K, Q8\\_0\n\n## Training Details\n\n### Training Data\n\n* Based on gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts pretraining corpus (public large-scale web text, open datasets).\n* No additional fine-tuning was performed for this release.\n\n### Training Procedure\n\n* Original gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts → quantized to GGUF formats.\n\n### Quantization Results\n\n| Quantization | Size (vs. FP16) | Speed     | Quality    | Recommended For                      |\n|--------------|-----------------|-----------|------------|--------------------------------------|\n| Q2_K         | Smallest        | Fastest   | Low        | Prototyping, minimal RAM/CPU         |\n| Q3_K_S       | Very Small      | Very Fast | Low-Med    | Lightweight devices, testing         |\n| Q3_K_M       | Small           | Fast      | Med        | Lightweight, slightly better quality |\n| Q3_K_L       | Small-Med       | Fast      | Med        | Faster inference, fair quality       |\n| Q4_0         | Medium          | Fast      | Good       | General use, chats, low RAM          |\n| Q4_1         | Medium          | Fast      | Good+      | Recommended, slightly better quality |\n| Q4_K_S       | Medium          | Fast      | Good+      | Recommended, balanced                |\n| Q4_K_M       | Medium          | Fast      | Good++     | Recommended, best Q4 option          |\n| Q5_0         | Larger          | Moderate  | Very Good  | Chatbots, longer responses           |\n| Q5_1         | Larger          | Moderate  | Very Good+ | More demanding tasks                 |\n| Q5_K_S       | Larger          | Moderate  | Very Good+ | Advanced users, better accuracy      |\n| Q5_K_M       | Larger          | Moderate  | Excellent  | Demanding tasks, high quality        |\n| Q6_K         | Large           | Slower    | Near FP16  | Power users, best quantized quality  |\n| Q8_0         | Largest         | Slowest   | FP16-like  | Maximum quality, high RAM/CPU        |\n\n> **Note:**  \n> - Lower quantization = smaller model, faster inference, but lower output quality.  \n> - Q4_K_M is ideal for most users; Q6_K/Q8_0 offer the highest quality, best for advanced use.  \n> - All quantizations are suitable for consumer hardware—select based on your quality/speed needs.\n\n\n## Technical Specifications\n\n#### Software\n\n* llama.cpp for quantization\n* Python 3.10, huggingface\\_hub\n\n## Citation\n\n**BibTeX:**\n\n```bibtex\n@miscgpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF,\n  title=gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF Quantized Models},\n  author={leeminwaan},\n  year={2025},\n  howpublished={\\url{https://huggingface.co/leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF}}\n}\n```\n\n**APA:**\n\n```\nleeminwaan. (2025). gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF Quantized Models [Computer software]. Hugging Face. https://huggingface.co/leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF\n```\n\n## Glossary\n\n* **Quantization:** Reducing precision of weights to lower memory usage.\n* **GGUF:** Optimized format for llama.cpp inference.\n\n## More Information\n\n* This project is experimental.\n* Expect further updates and quantization benchmarks.\n\n## Model Card Authors\n\n* leeminwaan\n\n## Model Card Contact\n\n* Hugging Face: [leeminwaan](https://huggingface.co/leeminwaan)\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "text-generation",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 2,
  "downloads": 150,
  "gated": false,
  "private": false,
  "last_modified": "2025-09-01T12:09:37.000Z",
  "created_at": "2025-09-01T12:02:57.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "68b58b713eab91ae003ab36b",
  "id": "leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF",
  "modelId": "leeminwaan/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts-GGUF",
  "sha": "16687acc51dd81660ee2d2ae96ddf15424f9dc0b",
  "createdAt": "2025-09-01T12:02:57.000Z",
  "lastModified": "2025-09-01T12:09:37.000Z",
  "author": "leeminwaan",
  "downloads": 150,
  "likes": 2,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 16
}