GraySoft
Projects Models About FAQ Contact Download guIDE →
Model Intelligence Sheet

noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf overview

These are quantizations of the model Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled The mmproj files are the same from unsloth. Read the guide from unsloth in order to set up the model's recommended settings: Qwen3.5 - How to Run Locally Guide The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest. So I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8. The order of preference is BF16, then F16. On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.

ggufimage-text-to-textbase_model:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilledbase_model:quantized:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilledendpoints_compatibleregion:usconversational
noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf visual
Downloads
10,175
Likes
7
Pipeline
image-text-to-text
Library
Visibility
Public
Access
Open

Repository Files & Downloads

5 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE.gguf GGUF 18.88 GB Download
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE_BF16.gguf GGUF BF16 20.55 GB Download
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE_F16.gguf GGUF F16 20.55 GB Download
mmproj-BF16.gguf GGUF BF16 861.00 MB Download
mmproj-F32.gguf GGUF F32 1.66 GB Download

Model Details Live

Model Slug
noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf
Author
noctrex
Pipeline Task
image-text-to-text
Library
Created
2026-03-14
Last Modified
2026-03-17
Gated
No
Private
No
HF SHA
3c14d9fe668ede879817ecfa83ccbe2d146dd8a3
License
Unknown
Language
Unknown
Base Model
Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "pipeline_tag": "image-text-to-text",
    "base_model": [
      "Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled"
    ],
    "frontmatter": {
      "pipeline_tag": "image-text-to-text",
      "base_model": [
        "Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled"
      ]
    },
    "hero_image_url": "",
    "summary": "These are quantizations of the model Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled The mmproj files are the same from unsloth. Read the guide from unsloth in order to set up the model's recommended settings: Qwen3.5 - How to Run Locally Guide The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest. So I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8. The order of preference is BF16, then F16. On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\npipeline_tag: image-text-to-text\nbase_model:\n- Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled\n---\nThese are quantizations of the model [Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled)\n\n- Download the latest [llama.cpp](https://github.com/ggml-org/llama.cpp) to use these quantizations.  \n- For the `mmproj` file, the F32 version is recommended for best results.  \nThe mmproj files are the same from unsloth.\n\nRead the guide from unsloth in order to set up the model's recommended settings:  \n[Qwen3.5 - How to Run Locally Guide](https://unsloth.ai/docs/models/qwen3.5)\n\nThe mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest.  \nSo I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8.  \nThe order of preference is BF16, then F16.  \nOn some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "image-text-to-text",
    "base_model:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled",
    "base_model:quantized:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 7,
  "downloads": 10175,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-17T10:08:19.000Z",
  "created_at": "2026-03-14T18:30:32.000Z",
  "pipeline_tag": "image-text-to-text",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "69b5a9482b0587383a1dd79a",
  "id": "noctrex/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE-GGUF",
  "modelId": "noctrex/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE-GGUF",
  "sha": "3c14d9fe668ede879817ecfa83ccbe2d146dd8a3",
  "createdAt": "2026-03-14T18:30:32.000Z",
  "lastModified": "2026-03-17T10:08:19.000Z",
  "author": "noctrex",
  "downloads": 10175,
  "likes": 7,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-text-to-text",
  "library_name": "",
  "siblings_count": 7
}