GraySoft
Projects Models About FAQ Contact Download guIDE →

lefromage/qwen3-next-80b-a3b-thinking-gguf Q3_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

lefromage/qwen3-next-80b-a3b-thinking-gguf overview

time cmake -B build -DGGML_CUDA=ON time cmake --build build --config Release --parallel $(nproc --all) You may need to add /usr/local/cuda/bin to your PATH to find nvcc (Nvidia CUDA compiler) Building from source took about 7 minutes. For more detail on CUDA build see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda

ggufGGUFtext-generationbase_model:Qwen/Qwen3-Next-80B-A3B-Thinkingbase_model:quantized:Qwen/Qwen3-Next-80B-A3B-Thinkinglicense:apache-2.0endpoints_compatibleregion:usconversational
lefromage/qwen3-next-80b-a3b-thinking-gguf visual
Downloads
100
Likes
9
Pipeline
text-generation
Library
Visibility
Public
Access
Open

Repository Files & Downloads

9 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen__Qwen3-Next-80B-A3B-Thinking-MXFP4_MOE.gguf GGUF 40.74 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q2_K.gguf GGUF Q2_K 27.13 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q3_K_M.gguf GGUF Q3_K_M 35.57 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf GGUF 41.98 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_K_M.gguf GGUF Q4_K_M 45.09 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_K_S.gguf GGUF Q4_K_S 42.36 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q5_K_M.gguf GGUF Q5_K_M 52.82 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q6_K.gguf GGUF Q6_K 61.03 GB Download
Qwen__Qwen3-Next-80B-A3B-Thinking-Q8_0.gguf GGUF 78.99 GB Download

Model Details Live

Model Slug
lefromage/qwen3-next-80b-a3b-thinking-gguf
Author
lefromage
Pipeline Task
text-generation
Library
Created
2025-10-27
Last Modified
2025-10-28
Gated
No
Private
No
HF SHA
dbca3ad9b66fabcb8767029834fe0b09b1da1a91
License
apache-2.0
Language
Unknown
Base Model
Qwen/Qwen3-Next-80B-A3B-Thinking

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "base_model": [
      "Qwen/Qwen3-Next-80B-A3B-Thinking"
    ],
    "license": "apache-2.0",
    "pipeline_tag": "text-generation",
    "tags": [
      "GGUF"
    ],
    "frontmatter": {
      "base_model": [
        "Qwen/Qwen3-Next-80B-A3B-Thinking"
      ],
      "license": "apache-2.0",
      "pipeline_tag": "text-generation",
      "tags": [
        "GGUF"
      ]
    },
    "hero_image_url": "",
    "summary": "time cmake -B build -DGGML_CUDA=ON time cmake --build build --config Release --parallel $(nproc --all) ``` You may need to add /usr/local/cuda/bin to your PATH to find nvcc (Nvidia CUDA compiler) Building from source took about 7 minutes. For more detail on CUDA build see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model:\n- Qwen/Qwen3-Next-80B-A3B-Thinking\nlicense: apache-2.0\npipeline_tag: text-generation\ntags:\n- GGUF\n---\n\nto be used with llama.cpp PR 16095\n\n## Update: \nI have tested some of these smaller models on NVIDIA with default CUDA compile\nwith the excellent release from @cturan on NVIDIA L40S GPU.\n\nSince L40S GPU is 48GB VRAM, I was able to run Q2_K, Q3_K_M, Q4_K_S, Q4_0 and Q4_MXFP4_MOE:\n\nbut Q4_K_M was too big.\nAlthough it works if using -ngl 45 \nbut it slowed down quite a bit.\n\nThere may be a better way but did not have time to test.\n\nWas able to get a good speed of 53 tokens per second in the generation \nand 800 tokens per second in the prompt reading.\n\n```bash\nwget https://github.com/cturan/llama.cpp/archive/refs/tags/test.tar.gz\ntar xf test.tar.gz\ncd llama.cpp-test\n\n# export PATH=/usr/local/cuda/bin:$PATH\n\ntime cmake -B build -DGGML_CUDA=ON\ntime cmake --build build --config Release --parallel $(nproc --all)\n```\n\nYou may need to add /usr/local/cuda/bin to your PATH\nto find nvcc (Nvidia CUDA compiler)\n\nBuilding from source took about 7 minutes.\n\nFor more detail on CUDA build see: \nhttps://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda\n\n\n## Quantized Models:\n\nThese quantized models were generated using the excellent pull request from @pwilkin\n[#16095](https://github.com/ggml-org/llama.cpp/pull/16095) \non 2025-10-19 with commit `2fdbf16eb`.\n\nNOTE: currently they only work with the llama.cpp 16095 pull request which is still in development. \nSpeed and quality should improve over time.\n\n### How to build and run for MacOS\n\n```bash\nPR=16095\ngit clone https://github.com/ggml-org/llama.cpp llama.cpp-PR-$PR\ncd llama.cpp-PR-$PR\n\ngit fetch origin pull/$PR/head:pr-$PR\ngit checkout pr-$PR\n\ntime cmake -B build\ntime cmake --build build --config Release --parallel $(nproc --all)\n```\n\n### Run examples\n\nRun with Hugging Face model:\n\n```bash\nbuild/bin/llama-cli -hf lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF --prompt 'What is the capital of France?' --no-mmap -st\n```\nby default will download lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF:Q4_K_M\n\nTo download:\n```bash\nwget https://huggingface.co/lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF/resolve/main/Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf\n```\nor\n```bash\npip install hf_transfer 'huggingface_hub[cli]'\nhf download lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf\n```\n\nRun with local model file:\n\n```bash\nbuild/bin/llama-cli -m Qwen__Qwen3-Next-80B-A3B-Thinking-Q4_0.gguf --prompt 'Write a paragraph about quantum computing' --no-mmap -st\n```\n\n\n```bash\nbuild/bin/llama-cli -ngl 100 -m Qwen__Qwen3-Next-80B-A3B-Thinking-Q2_K.gguf --no-mmap --prompt 'what is the capital of france' -st\n```\n\n### Example prompt and output\n\n**User prompt:**\nwhat is the capital of france\n\n**Assistant output:**\n\n```\n<think>\nOkay, the user asked, \"what is the capital of France?\" Hmm, that's a pretty basic geography question. Let me think. First, I know the capital of France is Paris. But wait, maybe I should double-check to be absolutely sure. Yeah, definitely Paris. It's one of those common facts that's easy to remember. \n\nWait, but why would someone ask this? Maybe they're a student doing homework, or maybe a traveler planning a trip. Or perhaps they're just curious. Either way, the answer is straightforward. No need for complex analysis here. \n\nI recall that Paris has been the capital for a long time, since the Middle Ages. It's not like some countries where the capital changed. France has always had Paris as its capital. Yeah, that's right. \n\nShould I mention anything else? Like maybe that it's the largest city in France, or that it's known for landmarks like the Eiffel Tower? But the question is just about the capital, so maybe keep it simple. Just confirm Paris. \n\nWait, is there any chance of confusion? Like, sometimes people think of Marseille or Lyon, but no, those are other major cities but not the capital. Definitely Paris. \n\nI think the best answer is to state clearly that the capital is Paris. Maybe add a bit about it being a major global city. But keep it short and accurate. \n\nYeah, I think that's it. No need to overcomplicate. Just give the answer directly. So, \"The capital of France is Paris.\" Done.\n</think>\n\nThe capital of France is **Paris**.  \n\nParis has been the capital since the 6th century and is France's largest city, as well as a major global center for culture, commerce, and tourism. It is renowned for landmarks like the Eiffel Tower, the Louvre, and Notre-Dame Cathedral. [end of text]\n\n\nllama_perf_sampler_print:    sampling time =      33.98 ms /   403 runs   (    0.08 ms per token, 11858.87 tokens per second)\nllama_perf_context_print:        load time =   10380.46 ms\nllama_perf_context_print: prompt eval time =    5709.11 ms /    14 tokens (  407.79 ms per token,     2.45 tokens per second)\nllama_perf_context_print:        eval time =   85045.12 ms /   388 runs   (  219.19 ms per token,     4.56 tokens per second)\nllama_perf_context_print:       total time =   90917.58 ms /   402 tokens\nllama_perf_context_print:    graphs reused =          0\nllama_memory_breakdown_print: | memory breakdown [MiB]   | total    free     self   model   context   compute    unaccounted |\nllama_memory_breakdown_print: |   - Metal (Apple M4 Max) | 98304 = 69920 + (28151 = 27675 +     171 +     304) +         232 |\nllama_memory_breakdown_print: |   - Host                 |                    167 =    97 +       0 +      70                |\nggml_metal_free: deallocating\n\nreal\t1m41.530s\n```\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "GGUF",
    "text-generation",
    "base_model:Qwen/Qwen3-Next-80B-A3B-Thinking",
    "base_model:quantized:Qwen/Qwen3-Next-80B-A3B-Thinking",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 9,
  "downloads": 100,
  "gated": false,
  "private": false,
  "last_modified": "2025-10-28T02:03:39.000Z",
  "created_at": "2025-10-27T18:32:56.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "68ffbad89562e020342b5f0c",
  "id": "lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF",
  "modelId": "lefromage/Qwen3-Next-80B-A3B-Thinking-GGUF",
  "sha": "dbca3ad9b66fabcb8767029834fe0b09b1da1a91",
  "createdAt": "2025-10-27T18:32:56.000Z",
  "lastModified": "2025-10-28T02:03:39.000Z",
  "author": "lefromage",
  "downloads": 100,
  "likes": 9,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 11
}