dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf overview

GGUF quantizations of dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B, the first publicly available REAP-40% pruned variant of MiniMax-M2.7. ---

ggufminimaxmoereapquantizedllama-cpptext-generationarxiv:2510.13999base_model:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10Bbase_model:quantized:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10Blicense:otherendpoints_compatibleregion:usimatrixconversational

dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf visual

Downloads

1,793

Likes

Pipeline

text-generation

Library

gguf

Visibility

Public

Access

Open

Repository Files & Downloads

6 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
MiniMax-M2.7-REAP-139B-IQ4_NL-MoE.gguf	GGUF	IQ4_NL	74.57 GB	Download
MiniMax-M2.7-REAP-139B-IQ4_XS.gguf	GGUF	IQ4_XS	69.15 GB	Download
MiniMax-M2.7-REAP-139B-Q3_K_M.gguf	GGUF	Q3_K_M	62.01 GB	Download
MiniMax-M2.7-REAP-139B-Q4_K_M.gguf	GGUF	Q4_K_M	78.40 GB	Download
MiniMax-M2.7-REAP-139B-Q6_K.gguf	GGUF	Q6_K	106.40 GB	Download
MiniMax-M2.7-REAP-139B-Q8_0.gguf	GGUF	—	137.78 GB	Download

Model Details Live

Model Slug

dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf

Author

dervig

Pipeline Task

text-generation

Library

gguf

Created

2026-04-15

Last Modified

2026-04-16

Gated

Private

HF SHA

3ad1531fe4dce8a6824511e5d820d3a10c8923ea

License

other

Language

Unknown

Base Model

dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "other",
    "license_name": "modified-mit",
    "license_link": "https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE",
    "base_model": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
    "base_model_relation": "quantized",
    "tags": [
      "minimax",
      "moe",
      "reap",
      "gguf",
      "quantized",
      "llama-cpp"
    ],
    "library_name": "gguf",
    "pipeline_tag": "text-generation",
    "frontmatter": {
      "license": "other",
      "license_name": "modified-mit",
      "license_link": "https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE",
      "base_model": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
      "base_model_relation": "quantized",
      "tags": [
        "minimax",
        "moe",
        "reap",
        "gguf",
        "quantized",
        "llama-cpp"
      ],
      "library_name": "gguf",
      "pipeline_tag": "text-generation"
    },
    "hero_image_url": "",
    "summary": "GGUF quantizations of dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B, the first publicly available REAP-40% pruned variant of MiniMax-M2.7. ---",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlicense_name: modified-mit\nlicense_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE\nbase_model: dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B\nbase_model_relation: quantized\ntags:\n  - minimax\n  - moe\n  - reap\n  - gguf\n  - quantized\n  - llama-cpp\nlibrary_name: gguf\npipeline_tag: text-generation\n---\n\n# m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF\n\nGGUF quantizations of [`dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B`](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B), the first publicly available REAP-40% pruned variant of MiniMax-M2.7.\n\n---\n\n## Available quantizations\n\nSizes are approximate; the model card will refresh as each quant is uploaded to this repo.\n\n| Variant | Approx. size | Target hardware | Notes |\n|---------|--------------|-----------------|-------|\n| `Q4_K_M` | **~84 GB** | **96 GB Apple Silicon (Mac Studio M4 Max)** | **Recommended sweet spot.** Smoke-test verified 5/5. |\n| `IQ4_XS` | ~74 GB | 96 GB Apple Silicon with extra headroom | Smaller than Q4_K_M, marginally lower quality. |\n| `Q3_K_M` | ~66 GB | 64 GB Mac / 2×RTX 3090 | Budget option; expect some reasoning loss. |\n| `Q6_K` | ~114 GB | 128 GB Mac Ultra | High-quality. |\n| `Q8_0` | ~148 GB | 192+ GB systems | Near-lossless. |\n| `IQ4_NL-MoE` | ~80 GB  | 96 GB Mac / 2×RTX 3090 | MoE-aware: `attn=Q8_0`, `experts=IQ4_NL`, `embed/output=Q6_K`. Mirrors ubergarm's mainline-compatible recipe. |\n\n## Which should you pick?\n\n- **96 GB Apple Silicon (Mac Studio M4 Max)**: **Q4_K_M** — ~84 GB leaves ~12 GB for KV cache at ~16K context.\n- **64 GB Mac**: Q3_K_M is the only variant that fits. Expect some reasoning-quality degradation.\n- **128 GB Mac Ultra / 2× A6000**: Q6_K for near-baseline quality.\n- **192+ GB system (dual H100 / RTX 6000 Ada)**: Q8_0 for minimal quality loss.\n- **Alternative to Q4_K_M on 96 GB**: `IQ4_NL-MoE` keeps attention at Q8_0 and quantizes only expert FFN tensors. Similar size, often better code/reasoning.\n\n## Evaluation\n\n**HumanEval pass@1 on Q4_K_M (on completed): 83.3 %** (90 / 108)\n\nFor problems where the model completed its `<think>` reasoning within a 32 K-token generation budget, the Q4_K_M quant solved 90 of 108 correctly.\n\n**Strict pass@1 (all 164 problems, cap-outs counted as fails): 54.9 %**\n\n56 of 164 problems exhausted the 32 K reasoning budget mid-`<think>` and are counted as fails under strict academic scoring. Allocate **≥64 K tokens to approach the 83 % ceiling**.\n\n**Methodology**: 2 × H100 80 GB, llama.cpp `/v1/chat/completions`, native `<think>` enabled, `temperature=0.2`, `top_p=0.95`, `max_tokens=32000`.\n\n*Prior methodology note*: an earlier evaluation using raw `/v1/completions` with chat-prose stripping (non-canonical for reasoning models) reported 65.2 %. The numbers above use the canonical chat-completion path.\n\n**Smoke test** (5 diverse pre-publish prompts): **5 / 5 PASS** — trivial arithmetic, Python Fibonacci, Norwegian response, MoE semantic explanation, JSON tool-call echo.\n\n## Memory & context sizing for consumer hardware\n\n### 96 GB Apple Silicon (primary target)\n\n| Variant | File size | ctx 8K | ctx 32K | ctx 60K | ctx 131K |\n|---|---|---|---|---|---|\n| **Q4_K_M** | 84 GB | ✓ | ✓ w/ KV `q8_0` | ✓ w/ KV `q4_0` | requires KV `q4_0` |\n| **IQ4_XS** | 74 GB | ✓ | ✓ | ✓ | ✓ w/ KV `q8_0` |\n| **Q3_K_M** | 66 GB | ✓ | ✓ | ✓ | ✓ |\n| **IQ4_NL-MoE** | 80 GB | ✓ | ✓ w/ KV `q8_0` | ✓ w/ KV `q4_0` | requires KV `q4_0` |\n| Q6_K / Q8_0 | 114 / 148 GB | too large for 96 GB system | — | — | — |\n\nThe native FP16 KV cache costs **~0.25 GB per 1K tokens** for this architecture (62 layers × 1024 KV dim × 2 bytes). That is non-trivial at long context: Q4_K_M at ctx=60K needs ~15 GB of KV cache alone.\n\n### KV cache quantization — essential for long context on 96 GB\n\nllama.cpp supports quantizing the KV cache with near-zero quality loss:\n\n```bash\n./llama-server -m MiniMax-M2.7-REAP-139B-A10B-Q4_K_M.gguf   -c 65536 -ngl 99   --cache-type-k q8_0 --cache-type-v q8_0\n```\n\n| KV type | Size @ ctx=60K | Quality impact |\n|---|---|---|\n| FP16 (default) | 15 GB | baseline |\n| **`q8_0`** | 7.5 GB | essentially lossless (recommended) |\n| `q4_0` / `q4_1` | 3.8 GB | very small degradation, worth it for extreme context |\n\n### Other systems\n\n- **64 GB Mac / 2× RTX 3090**: Q3_K_M with `q8_0` KV fits at ctx=32K.\n- **128 GB Mac Ultra**: Q6_K comfortably at ctx=32K, tight at longer context.\n- **Dual H100 (160 GB) / 192 GB+ systems**: Q8_0 near-lossless, full context.\n\n## Known minor imperfection\n\nDuring integrity audit, one layer (`layer 0`) had expert keep-indices that differed from the REAP-retained set in ~86 of 154 positions. The bias-value mismatch is bounded by the layer-0 bias natural variance (`max |Δ|=0.75` on values `∈ [8.06, 8.88]`), so router behavior is essentially unchanged — confirmed by the 5/5 smoke test above. All other 61 layers are bit-perfect. Details in the [safetensors model card](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B#known-minor-imperfection).\n\n## Citation\n\nSee the [safetensors repo](https://huggingface.co/dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B#citation) for full citation details. Core references:\n- Lasby et al., **REAP the Experts** (arXiv:2510.13999)\n- MiniMax-M2.7 base model (MiniMaxAI)\n\n## License\n\nInherits the [Modified MIT License](https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE) from MiniMaxAI/MiniMax-M2.7.\n\n---\n\n_Published by [m51Lab](https://m51.ai) — open-source LLM contributions from the M51 AI OS group._\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "minimax",
    "moe",
    "reap",
    "quantized",
    "llama-cpp",
    "text-generation",
    "arxiv:2510.13999",
    "base_model:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
    "base_model:quantized:dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B",
    "license:other",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 2,
  "downloads": 1793,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-16T10:20:38.000Z",
  "created_at": "2026-04-15T14:11:14.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "gguf"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69df9c82ed1407ee32c58638",
  "id": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF",
  "modelId": "dervig/m51Lab-MiniMax-M2.7-REAP-139B-A10B-GGUF",
  "sha": "3ad1531fe4dce8a6824511e5d820d3a10c8923ea",
  "createdAt": "2026-04-15T14:11:14.000Z",
  "lastModified": "2026-04-16T10:20:38.000Z",
  "author": "dervig",
  "downloads": 1793,
  "likes": 2,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "gguf",
  "siblings_count": 8
}

dervig/m51lab-minimax-m2.7-reap-139b-a10b-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard