dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf overview
GGUF quantizations of dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B, the first publicly available REAP-40% pruned variant of MiniMax-M2.7. ---
Downloads
1,793
Likes
2
Pipeline
text-generation
Library
gguf
Visibility
Public
Access
Open
Repository Files & Downloads
6 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| MiniMax-M2.7-REAP-139B-IQ4_NL-MoE.gguf | GGUF | IQ4_NL | 74.57 GB | Download |
| MiniMax-M2.7-REAP-139B-IQ4_XS.gguf | GGUF | IQ4_XS | 69.15 GB | Download |
| MiniMax-M2.7-REAP-139B-Q3_K_M.gguf | GGUF | Q3_K_M | 62.01 GB | Download |
| MiniMax-M2.7-REAP-139B-Q4_K_M.gguf | GGUF | Q4_K_M | 78.40 GB | Download |
| MiniMax-M2.7-REAP-139B-Q6_K.gguf | GGUF | Q6_K | 106.40 GB | Download |
| MiniMax-M2.7-REAP-139B-Q8_0.gguf | GGUF | — | 137.78 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "other",
"license_name": "modified-mit",
"license_link": "https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE",
"base_model": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
"base_model_relation": "quantized",
"tags": [
"minimax",
"moe",
"reap",
"gguf",
"quantized",
"llama-cpp"
],
"library_name": "gguf",
"pipeline_tag": "text-generation",
"frontmatter": {
"license": "other",
"license_name": "modified-mit",
"license_link": "https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE",
"base_model": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
"base_model_relation": "quantized",
"tags": [
"minimax",
"moe",
"reap",
"gguf",
"quantized",
"llama-cpp"
],
"library_name": "gguf",
"pipeline_tag": "text-generation"
},
"hero_image_url": "",
"summary": "GGUF quantizations of dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B, the first publicly available REAP-40% pruned variant of MiniMax-M2.7. ---",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: other\nlicense_name: modified-mit\nlicense_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE\nbase_model: dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B\nbase_model_relation: quantized\ntags:\n - minimax\n - moe\n - reap\n - gguf\n - quantized\n - llama-cpp\nlibrary_name: gguf\npipeline_tag: text-generation\n---\n\n# m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF\n\nGGUF quantizations of [`dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B`](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B), the first publicly available REAP-40% pruned variant of MiniMax-M2.7.\n\n---\n\n## Available quantizations\n\nSizes are approximate; the model card will refresh as each quant is uploaded to this repo.\n\n| Variant | Approx. size | Target hardware | Notes |\n|---------|--------------|-----------------|-------|\n| `Q4_K_M` | **~84 GB** | **96 GB Apple Silicon (Mac Studio M4 Max)** | **Recommended sweet spot.** Smoke-test verified 5/5. |\n| `IQ4_XS` | ~74 GB | 96 GB Apple Silicon with extra headroom | Smaller than Q4_K_M, marginally lower quality. |\n| `Q3_K_M` | ~66 GB | 64 GB Mac / 2×RTX 3090 | Budget option; expect some reasoning loss. |\n| `Q6_K` | ~114 GB | 128 GB Mac Ultra | High-quality. |\n| `Q8_0` | ~148 GB | 192+ GB systems | Near-lossless. |\n| `IQ4_NL-MoE` | ~80 GB | 96 GB Mac / 2×RTX 3090 | MoE-aware: `attn=Q8_0`, `experts=IQ4_NL`, `embed/output=Q6_K`. Mirrors ubergarm's mainline-compatible recipe. |\n\n## Which should you pick?\n\n- **96 GB Apple Silicon (Mac Studio M4 Max)**: **Q4_K_M** — ~84 GB leaves ~12 GB for KV cache at ~16K context.\n- **64 GB Mac**: Q3_K_M is the only variant that fits. Expect some reasoning-quality degradation.\n- **128 GB Mac Ultra / 2× A6000**: Q6_K for near-baseline quality.\n- **192+ GB system (dual H100 / RTX 6000 Ada)**: Q8_0 for minimal quality loss.\n- **Alternative to Q4_K_M on 96 GB**: `IQ4_NL-MoE` keeps attention at Q8_0 and quantizes only expert FFN tensors. Similar size, often better code/reasoning.\n\n## Evaluation\n\n**HumanEval pass@1 on Q4_K_M (on completed): 83.3 %** (90 / 108)\n\nFor problems where the model completed its `<think>` reasoning within a 32 K-token generation budget, the Q4_K_M quant solved 90 of 108 correctly.\n\n**Strict pass@1 (all 164 problems, cap-outs counted as fails): 54.9 %**\n\n56 of 164 problems exhausted the 32 K reasoning budget mid-`<think>` and are counted as fails under strict academic scoring. Allocate **≥64 K tokens to approach the 83 % ceiling**.\n\n**Methodology**: 2 × H100 80 GB, llama.cpp `/v1/chat/completions`, native `<think>` enabled, `temperature=0.2`, `top_p=0.95`, `max_tokens=32000`.\n\n*Prior methodology note*: an earlier evaluation using raw `/v1/completions` with chat-prose stripping (non-canonical for reasoning models) reported 65.2 %. The numbers above use the canonical chat-completion path.\n\n**Smoke test** (5 diverse pre-publish prompts): **5 / 5 PASS** — trivial arithmetic, Python Fibonacci, Norwegian response, MoE semantic explanation, JSON tool-call echo.\n\n## Memory & context sizing for consumer hardware\n\n### 96 GB Apple Silicon (primary target)\n\n| Variant | File size | ctx 8K | ctx 32K | ctx 60K | ctx 131K |\n|---|---|---|---|---|---|\n| **Q4_K_M** | 84 GB | ✓ | ✓ w/ KV `q8_0` | ✓ w/ KV `q4_0` | requires KV `q4_0` |\n| **IQ4_XS** | 74 GB | ✓ | ✓ | ✓ | ✓ w/ KV `q8_0` |\n| **Q3_K_M** | 66 GB | ✓ | ✓ | ✓ | ✓ |\n| **IQ4_NL-MoE** | 80 GB | ✓ | ✓ w/ KV `q8_0` | ✓ w/ KV `q4_0` | requires KV `q4_0` |\n| Q6_K / Q8_0 | 114 / 148 GB | too large for 96 GB system | — | — | — |\n\nThe native FP16 KV cache costs **~0.25 GB per 1K tokens** for this architecture (62 layers × 1024 KV dim × 2 bytes). That is non-trivial at long context: Q4_K_M at ctx=60K needs ~15 GB of KV cache alone.\n\n### KV cache quantization — essential for long context on 96 GB\n\nllama.cpp supports quantizing the KV cache with near-zero quality loss:\n\n```bash\n./llama-server -m MiniMax-M2.7-REAP-139B-A10B-Q4_K_M.gguf -c 65536 -ngl 99 --cache-type-k q8_0 --cache-type-v q8_0\n```\n\n| KV type | Size @ ctx=60K | Quality impact |\n|---|---|---|\n| FP16 (default) | 15 GB | baseline |\n| **`q8_0`** | 7.5 GB | essentially lossless (recommended) |\n| `q4_0` / `q4_1` | 3.8 GB | very small degradation, worth it for extreme context |\n\n### Other systems\n\n- **64 GB Mac / 2× RTX 3090**: Q3_K_M with `q8_0` KV fits at ctx=32K.\n- **128 GB Mac Ultra**: Q6_K comfortably at ctx=32K, tight at longer context.\n- **Dual H100 (160 GB) / 192 GB+ systems**: Q8_0 near-lossless, full context.\n\n## Known minor imperfection\n\nDuring integrity audit, one layer (`layer 0`) had expert keep-indices that differed from the REAP-retained set in ~86 of 154 positions. The bias-value mismatch is bounded by the layer-0 bias natural variance (`max |Δ|=0.75` on values `∈ [8.06, 8.88]`), so router behavior is essentially unchanged — confirmed by the 5/5 smoke test above. All other 61 layers are bit-perfect. Details in the [safetensors model card](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B#known-minor-imperfection).\n\n## Citation\n\nSee the [safetensors repo](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B#citation) for full citation details. Core references:\n- Lasby et al., **REAP the Experts** (arXiv:2510.13999)\n- MiniMax-M2.7 base model (MiniMaxAI)\n\n## License\n\nInherits the [Modified MIT License](https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE) from MiniMaxAI/MiniMax-M2.7.\n\n---\n\n_Published by [m51Lab](https://m51.ai) — open-source LLM contributions from the M51 AI OS group._\n",
"related_quantizations": []
},
"tags": [
"gguf",
"minimax",
"moe",
"reap",
"quantized",
"llama-cpp",
"text-generation",
"arxiv:2510.13999",
"base_model:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
"base_model:quantized:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
"license:other",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 2,
"downloads": 1793,
"gated": false,
"private": false,
"last_modified": "2026-04-16T10:20:38.000Z",
"created_at": "2026-04-15T14:11:14.000Z",
"pipeline_tag": "text-generation",
"library_name": "gguf"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69df9c82ed1407ee32c58638",
"id": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF",
"modelId": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF",
"sha": "3ad1531fe4dce8a6824511e5d820d3a10c8923ea",
"createdAt": "2026-04-15T14:11:14.000Z",
"lastModified": "2026-04-16T10:20:38.000Z",
"author": "dervig",
"downloads": 1793,
"likes": 2,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "gguf",
"siblings_count": 8
}