lefromage/qwen3-next-80b-a3b-thinking-gguf Q3_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

lefromage/qwen3-next-80b-a3b-thinking-gguf overview

time cmake -B build -DGGML_CUDA=ON time cmake --build build --config Release --parallel $(nproc --all) You may need to add /usr/local/cuda/bin to your PATH to find nvcc (Nvidia CUDA compiler) Building from source took about 7 minutes. For more detail on CUDA build see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda

ggufGGUFtext-generationbase_model:Qwen/Qwen3-Next-80B-A3B-Thinkingbase_model:quantized:Qwen/Qwen3-Next-80B-A3B-Thinkinglicense:apache-2.0endpoints_compatibleregion:usconversational

lefromage/qwen3-next-80b-a3b-thinking-gguf visual

Downloads

100

Likes

Pipeline

text-generation

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

9 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen__Qwen3-Next-80B-A3B-Thinking-MXFP4_MOE.gguf	GGUF	—	40.74 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q2_K.gguf	GGUF	Q2_K	27.13 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q3_K_M.gguf	GGUF	Q3_K_M	35.57 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf	GGUF	—	41.98 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_K_M.gguf	GGUF	Q4_K_M	45.09 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_K_S.gguf	GGUF	Q4_K_S	42.36 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q5_K_M.gguf	GGUF	Q5_K_M	52.82 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q6_K.gguf	GGUF	Q6_K	61.03 GB	Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q8_0.gguf	GGUF	—	78.99 GB	Download

Model Details Live

Model Slug

lefromage/qwen3-next-80b-a3b-thinking-gguf

Author

lefromage

Pipeline Task

text-generation

Library

—

Created

2025-10-27

Last Modified

2025-10-28

Gated

Private

HF SHA

dbca3ad9b66fabcb8767029834fe0b09b1da1a91

License

apache-2.0

Language

Unknown

Base Model

Qwen/Qwen3-Next-80B-A3B-Thinking

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": [
      "Qwen/Qwen3-Next-80B-A3B-Thinking"
    ],
    "license": "apache-2.0",
    "pipeline_tag": "text-generation",
    "tags": [
      "GGUF"
    ],
    "frontmatter": {
      "base_model": [
        "Qwen/Qwen3-Next-80B-A3B-Thinking"
      ],
      "license": "apache-2.0",
      "pipeline_tag": "text-generation",
      "tags": [
        "GGUF"
      ]
    },
    "hero_image_url": "",
    "summary": "time cmake -B build -DGGML_CUDA=ON time cmake --build build --config Release --parallel $(nproc --all) ``` You may need to add /usr/local/cuda/bin to your PATH to find nvcc (Nvidia CUDA compiler) Building from source took about 7 minutes. For more detail on CUDA build see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model:\n- Qwen/Qwen3-Next-80B-A3B-Thinking\nlicense: apache-2.0\npipeline_tag: text-generation\ntags:\n- GGUF\n---\n\nto be used with llama.cpp PR 16095\n\n## Update: \nI have tested some of these smaller models on NVIDIA with default CUDA compile\nwith the excellent release from @cturan on NVIDIA L40S GPU.\n\nSince L40S GPU is 48GB VRAM, I was able to run Q2_K, Q3_K_M, Q4_K_S, Q4_0 and Q4_MXFP4_MOE:\n\nbut Q4_K_M was too big.\nAlthough it works if using -ngl 45 \nbut it slowed down quite a bit.\n\nThere may be a better way but did not have time to test.\n\nWas able to get a good speed of 53 tokens per second in the generation \nand 800 tokens per second in the prompt reading.\n\n```bash\nwget https://github.com/cturan/llama.cpp/archive/refs/tags/test.tar.gz\ntar xf test.tar.gz\ncd llama.cpp-test\n\n# export PATH=/usr/local/cuda/bin:$PATH\n\ntime cmake -B build -DGGML_CUDA=ON\ntime cmake --build build --config Release --parallel $(nproc --all)\n```\n\nYou may need to add /usr/local/cuda/bin to your PATH\nto find nvcc (Nvidia CUDA compiler)\n\nBuilding from source took about 7 minutes.\n\nFor more detail on CUDA build see: \nhttps://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda\n\n\n## Quantized Models:\n\nThese quantized models were generated using the excellent pull request from @pwilkin\n[#16095](https://github.com/ggml-org/llama.cpp/pull/16095) \non 2025-10-19 with commit `2fdbf16eb`.\n\nNOTE: currently they only work with the llama.cpp 16095 pull request which is still in development. \nSpeed and quality should improve over time.\n\n### How to build and run for MacOS\n\n```bash\nPR=16095\ngit clone https://github.com/ggml-org/llama.cpp llama.cpp-PR-$PR\ncd llama.cpp-PR-$PR\n\ngit fetch origin pull/$PR/head:pr-$PR\ngit checkout pr-$PR\n\ntime cmake -B build\ntime cmake --build build --config Release --parallel $(nproc --all)\n```\n\n### Run examples\n\nRun with Hugging Face model:\n\n```bash\nbuild/bin/llama-cli -hf lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF --prompt 'What is the capital of France?' --no-mmap -st\n```\nby default will download lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF:Q4_K_M\n\nTo download:\n```bash\nwget https://huggingface.co/lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF/resolve/main/Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf\n```\nor\n```bash\npip install hf_transfer 'huggingface_hub[cli]'\nhf download lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf\n```\n\nRun with local model file:\n\n```bash\nbuild/bin/llama-cli -m Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf --prompt 'Write a paragraph about quantum computing' --no-mmap -st\n```\n\n\n```bash\nbuild/bin/llama-cli -ngl 100 -m Qwen__Qwen3-Next-80B-A3B-Thinking-Q2_K.gguf --no-mmap --prompt 'what is the capital of france' -st\n```\n\n### Example prompt and output\n\n**User prompt:**\nwhat is the capital of france\n\n**Assistant output:**\n\n```\n<think>\nOkay, the user asked, \"what is the capital of France?\" Hmm, that's a pretty basic geography question. Let me think. First, I know the capital of France is Paris. But wait, maybe I should double-check to be absolutely sure. Yeah, definitely Paris. It's one of those common facts that's easy to remember. \n\nWait, but why would someone ask this? Maybe they're a student doing homework, or maybe a traveler planning a trip. Or perhaps they're just curious. Either way, the answer is straightforward. No need for complex analysis here. \n\nI recall that Paris has been the capital for a long time, since the Middle Ages. It's not like some countries where the capital changed. France has always had Paris as its capital. Yeah, that's right. \n\nShould I mention anything else? Like maybe that it's the largest city in France, or that it's known for landmarks like the Eiffel Tower? But the question is just about the capital, so maybe keep it simple. Just confirm Paris. \n\nWait, is there any chance of confusion? Like, sometimes people think of Marseille or Lyon, but no, those are other major cities but not the capital. Definitely Paris. \n\nI think the best answer is to state clearly that the capital is Paris. Maybe add a bit about it being a major global city. But keep it short and accurate. \n\nYeah, I think that's it. No need to overcomplicate. Just give the answer directly. So, \"The capital of France is Paris.\" Done.\n</think>\n\nThe capital of France is **Paris**.  \n\nParis has been the capital since the 6th century and is France's largest city, as well as a major global center for culture, commerce, and tourism. It is renowned for landmarks like the Eiffel Tower, the Louvre, and Notre-Dame Cathedral. [end of text]\n\n\nllama_perf_sampler_print:    sampling time =      33.98 ms /   403 runs   (    0.08 ms per token, 11858.87 tokens per second)\nllama_perf_context_print:        load time =   10380.46 ms\nllama_perf_context_print: prompt eval time =    5709.11 ms /    14 tokens (  407.79 ms per token,     2.45 tokens per second)\nllama_perf_context_print:        eval time =   85045.12 ms /   388 runs   (  219.19 ms per token,     4.56 tokens per second)\nllama_perf_context_print:       total time =   90917.58 ms /   402 tokens\nllama_perf_context_print:    graphs reused =          0\nllama_memory_breakdown_print: | memory breakdown [MiB]   | total    free     self   model   context   compute    unaccounted |\nllama_memory_breakdown_print: |   - Metal (Apple M4 Max) | 98304 = 69920 + (28151 = 27675 +     171 +     304) +         232 |\nllama_memory_breakdown_print: |   - Host                 |                    167 =    97 +       0 +      70                |\nggml_metal_free: deallocating\n\nreal\t1m41.530s\n```\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "GGUF",
    "text-generation",
    "base_model:Qwen/Qwen3-Next-80B-A3B-Thinking",
    "base_model:quantized:Qwen/Qwen3-Next-80B-A3B-Thinking",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 9,
  "downloads": 100,
  "gated": false,
  "private": false,
  "last_modified": "2025-10-28T02:03:39.000Z",
  "created_at": "2025-10-27T18:32:56.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "68ffbad89562e020342b5f0c",
  "id": "lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF",
  "modelId": "lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF",
  "sha": "dbca3ad9b66fabcb8767029834fe0b09b1da1a91",
  "createdAt": "2025-10-27T18:32:56.000Z",
  "lastModified": "2025-10-28T02:03:39.000Z",
  "author": "lefromage",
  "downloads": 100,
  "likes": 9,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 11
}

lefromage/qwen3-next-80b-a3b-thinking-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard