Model Intelligence Sheet

noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf overview

These are quantizations of the model Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled The mmproj files are the same from unsloth. Read the guide from unsloth in order to set up the model's recommended settings: Qwen3.5 - How to Run Locally Guide The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest. So I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8. The order of preference is BF16, then F16. On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.

ggufimage-text-to-textbase_model:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilledbase_model:quantized:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilledendpoints_compatibleregion:usconversational

noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf visual

Downloads

10,175

Likes

Pipeline

image-text-to-text

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

5 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE.gguf	GGUF	—	18.88 GB	Download
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE_BF16.gguf	GGUF	BF16	20.55 GB	Download
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE_F16.gguf	GGUF	F16	20.55 GB	Download
mmproj-BF16.gguf	GGUF	BF16	861.00 MB	Download
mmproj-F32.gguf	GGUF	F32	1.66 GB	Download

Model Details Live

Model Slug

noctrex/qwen3.5-35b-a3b-claude-4.6-opus-reasoning-distilled-mxfp4_moe-gguf

Author

noctrex

Pipeline Task

image-text-to-text

Library

—

Created

2026-03-14

Last Modified

2026-03-17

Gated

Private

HF SHA

3c14d9fe668ede879817ecfa83ccbe2d146dd8a3

License

Unknown

Language

Unknown

Base Model

Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "pipeline_tag": "image-text-to-text",
    "base_model": [
      "Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled"
    ],
    "frontmatter": {
      "pipeline_tag": "image-text-to-text",
      "base_model": [
        "Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled"
      ]
    },
    "hero_image_url": "",
    "summary": "These are quantizations of the model Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled The mmproj files are the same from unsloth. Read the guide from unsloth in order to set up the model's recommended settings: Qwen3.5 - How to Run Locally Guide The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest. So I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8. The order of preference is BF16, then F16. On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\npipeline_tag: image-text-to-text\nbase_model:\n- Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled\n---\nThese are quantizations of the model [Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled)\n\n- Download the latest [llama.cpp](https://github.com/ggml-org/llama.cpp) to use these quantizations.  \n- For the `mmproj` file, the F32 version is recommended for best results.  \nThe mmproj files are the same from unsloth.\n\nRead the guide from unsloth in order to set up the model's recommended settings:  \n[Qwen3.5 - How to Run Locally Guide](https://unsloth.ai/docs/models/qwen3.5)\n\nThe mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest.  \nSo I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8.  \nThe order of preference is BF16, then F16.  \nOn some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "image-text-to-text",
    "base_model:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled",
    "base_model:quantized:Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 7,
  "downloads": 10175,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-17T10:08:19.000Z",
  "created_at": "2026-03-14T18:30:32.000Z",
  "pipeline_tag": "image-text-to-text",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69b5a9482b0587383a1dd79a",
  "id": "noctrex/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE-GGUF",
  "modelId": "noctrex/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-MXFP4_MOE-GGUF",
  "sha": "3c14d9fe668ede879817ecfa83ccbe2d146dd8a3",
  "createdAt": "2026-03-14T18:30:32.000Z",
  "lastModified": "2026-03-17T10:08:19.000Z",
  "author": "noctrex",
  "downloads": 10175,
  "likes": 7,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-text-to-text",
  "library_name": "",
  "siblings_count": 7
}