djlougen/hermes-qwen3.5-35b-a3b-gguf F16 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

djlougen/hermes-qwen3.5-35b-a3b-gguf overview

GGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on NousResearch/hermes-function-calling-v1 for structured function calling and tool use.

hermesggufqwen3.5moelorafunction-callingunslothtext-generationdataset:NousResearch/hermes-function-calling-v1base_model:Qwen/Qwen3.5-35B-A3Bbase_model:adapter:Qwen/Qwen3.5-35B-A3Bdoi:10.57967/hf/8262license:apache-2.0endpoints_compatibleregion:usconversational

djlougen/hermes-qwen3.5-35b-a3b-gguf visual

Downloads

20,143

Likes

Pipeline

text-generation

Library

hermes

Visibility

Public

Access

Open

Repository Files & Downloads

19 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
hermes-qwen3.5-35b-a3b-IQ1_M.gguf	GGUF	IQ1_M	7.67 GB	Download
hermes-qwen3.5-35b-a3b-IQ1_S.gguf	GGUF	IQ1_S	6.97 GB	Download
hermes-qwen3.5-35b-a3b-IQ2_M.gguf	GGUF	IQ2_M	10.86 GB	Download
hermes-qwen3.5-35b-a3b-IQ2_S.gguf	GGUF	IQ2_S	9.92 GB	Download
hermes-qwen3.5-35b-a3b-IQ2_XXS.gguf	GGUF	IQ2_XXS	8.85 GB	Download
hermes-qwen3.5-35b-a3b-IQ3_M.gguf	GGUF	IQ3_M	14.38 GB	Download
hermes-qwen3.5-35b-a3b-IQ3_S.gguf	GGUF	IQ3_S	14.20 GB	Download
hermes-qwen3.5-35b-a3b-IQ3_XXS.gguf	GGUF	IQ3_XXS	3.29 GB	Download
hermes-qwen3.5-35b-a3b-IQ4_NL.gguf	GGUF	IQ4_NL	18.42 GB	Download
hermes-qwen3.5-35b-a3b-IQ4_XS.gguf	GGUF	IQ4_XS	17.44 GB	Download
hermes-qwen3.5-35b-a3b-Q3_K_M.gguf	GGUF	Q3_K_M	15.61 GB	Download
hermes-qwen3.5-35b-a3b-Q3_K_S.gguf	GGUF	Q3_K_S	14.14 GB	Download
hermes-qwen3.5-35b-a3b-Q4_K_M.gguf	GGUF	Q4_K_M	19.71 GB	Download
hermes-qwen3.5-35b-a3b-Q4_K_S.gguf	GGUF	Q4_K_S	18.52 GB	Download
hermes-qwen3.5-35b-a3b-Q5_K_M.gguf	GGUF	Q5_K_M	23.03 GB	Download
hermes-qwen3.5-35b-a3b-Q5_K_S.gguf	GGUF	Q5_K_S	22.33 GB	Download
hermes-qwen3.5-35b-a3b-Q6_K.gguf	GGUF	Q6_K	26.56 GB	Download
hermes-qwen3.5-35b-a3b-Q8_0.gguf	GGUF	—	34.37 GB	Download
hermes-qwen3.5-35b-a3b-f16.gguf	GGUF	F16	64.61 GB	Download

Model Details Live

Model Slug

djlougen/hermes-qwen3.5-35b-a3b-gguf

Author

DJLougen

Pipeline Task

text-generation

Library

hermes

Created

2026-03-31

Last Modified

2026-04-03

Gated

Private

HF SHA

50034abc9358d4ab990736411e73d5c85323bb93

License

apache-2.0

Language

Unknown

Base Model

Qwen/Qwen3.5-35B-A3B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": "Qwen/Qwen3.5-35B-A3B",
    "tags": [
      "qwen3.5",
      "moe",
      "gguf",
      "lora",
      "function-calling",
      "hermes",
      "unsloth"
    ],
    "license": "apache-2.0",
    "datasets": [
      "NousResearch/hermes-function-calling-v1"
    ],
    "pipeline_tag": "text-generation",
    "model-index": [
      {
        "name": "hermes-qwen3.5-35b-a3b-GGUF",
        "results": []
      }
    ],
    "frontmatter": {
      "base_model": "Qwen/Qwen3.5-35B-A3B",
      "tags": [
        "qwen3.5",
        "moe",
        "gguf",
        "lora",
        "function-calling",
        "hermes",
        "unsloth"
      ],
      "license": "apache-2.0",
      "datasets": [
        "NousResearch/hermes-function-calling-v1"
      ],
      "pipeline_tag": [
        "name: hermes-qwen3.5-35b-a3b-GGUF"
      ]
    },
    "hero_image_url": "",
    "summary": "GGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on NousResearch/hermes-function-calling-v1 for structured function calling and tool use.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model: Qwen/Qwen3.5-35B-A3B\ntags:\n- qwen3.5\n- moe\n- gguf\n- lora\n- function-calling\n- hermes\n- unsloth\nlicense: apache-2.0\ndatasets:\n- NousResearch/hermes-function-calling-v1\npipeline_tag: text-generation\nmodel-index:\n- name: hermes-qwen3.5-35b-a3b-GGUF\n  results: []\n---\n\nNote: These models are optimized for use within an agentic harness (e.g. Hermes Agent) and may behave unexpectedly in raw inference without a system prompt. Capability benchmarks are strong but conversational behavior outside of a structured harness is not reliable. I am currently working on v2 to address this and reduce harness dependency.\n<div align=\"center\">\n<h3>Support This Work</h3>\n<p>\nI'm a PhD student who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand.\n<br><br>\nIf my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.\n</p>\n<p><a href=\"https://ko-fi.com/djlougen\">&#9749; ko-fi.com/djlougen</a></p>\n</div>\n\n---\n\n# Hermes Qwen3.5 35B-A3B GGUF\n\nGGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) for structured function calling and tool use.\n\n## Base Model\n\n- **Architecture:** Qwen3.5 MoE (Mixture of Experts) — 35B total parameters, ~3B active per token\n- **Base:** [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B)\n- **Context Length:** 262,144 tokens\n- **Experts:** 256 total, 8 active per token\n\n## Fine-Tuning Details\n\n- **Method:** LoRA via [Unsloth](https://github.com/unslothai/unsloth) + TRL SFTTrainer\n- **LoRA Rank (r):** 32\n- **LoRA Alpha:** 32\n- **LoRA Dropout:** 0\n- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`\n- **Training Precision:** bf16\n- **Optimizer:** AdamW 8-bit\n- **Learning Rate:** 2e-4 with cosine scheduler\n- **Warmup Steps:** 10\n- **Epochs:** 3\n- **Batch Size:** 2 per device, 8 gradient accumulation steps (effective batch size 16)\n- **Max Sequence Length:** 4,096 tokens\n- **Weight Decay:** 0.01\n- **PEFT Version:** 0.18.1\n\n### Training Dataset\n\n[NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) — a function-calling dataset following the Hermes Function-calling Standard. Includes:\n\n- Cleaned Glaive Function Calling samples\n- Advanced JSON structured output (agentic, multi-turn)\n- Single-turn JSON structured output samples\n\nConversations were formatted using ChatML (`<|im_start|>` / `<|im_end|>`) with role mapping: `system`, `human` -> `user`, `gpt` -> `assistant`, `tool`.\n\n## Quantization\n\nAll quantizations were produced using [llama.cpp](https://github.com/ggerganov/llama.cpp) with an **importance matrix** (imatrix) computed from WikiText-2 calibration data for improved quality at lower bit depths.\n\n### Available Quants\n\n| Filename | Quant | Type | Size |\n|----------|-------|------|------|\n| hermes-qwen3.5-35b-a3b-f16.gguf | F16 | Full precision | 64.6 GB |\n| hermes-qwen3.5-35b-a3b-Q8_0.gguf | Q8_0 | Standard | 36.9 GB |\n| hermes-qwen3.5-35b-a3b-Q6_K.gguf | Q6_K | K-quant | 28.5 GB |\n| hermes-qwen3.5-35b-a3b-Q5_K_M.gguf | Q5_K_M | K-quant | 24.7 GB |\n| hermes-qwen3.5-35b-a3b-Q5_K_S.gguf | Q5_K_S | K-quant | 24.0 GB |\n| hermes-qwen3.5-35b-a3b-Q4_K_M.gguf | Q4_K_M | K-quant | 21.2 GB |\n| hermes-qwen3.5-35b-a3b-Q4_K_S.gguf | Q4_K_S | K-quant | 19.9 GB |\n| hermes-qwen3.5-35b-a3b-IQ4_NL.gguf | IQ4_NL | imatrix | 19.8 GB |\n| hermes-qwen3.5-35b-a3b-IQ4_XS.gguf | IQ4_XS | imatrix | 18.7 GB |\n| hermes-qwen3.5-35b-a3b-Q3_K_M.gguf | Q3_K_M | K-quant | 16.8 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_M.gguf | IQ3_M | imatrix | 15.4 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_S.gguf | IQ3_S | imatrix | 15.3 GB |\n| hermes-qwen3.5-35b-a3b-Q3_K_S.gguf | Q3_K_S | K-quant | 15.2 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_XXS.gguf | IQ3_XXS | imatrix | 13.6 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_M.gguf | IQ2_M | imatrix | 11.7 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_S.gguf | IQ2_S | imatrix | 10.7 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_XXS.gguf | IQ2_XXS | imatrix | 9.5 GB |\n| hermes-qwen3.5-35b-a3b-IQ1_M.gguf | IQ1_M | imatrix | 8.2 GB |\n| hermes-qwen3.5-35b-a3b-IQ1_S.gguf | IQ1_S | imatrix | 7.5 GB |\n\nAll quantizations verified: 733 tensors, GGUF v3.\n\n### Choosing a Quant\n\n- **Q8_0** (36.9 GB): Closest to full precision. Use if you have the VRAM/RAM.\n- **Q6_K / Q5_K_M** (28.5 / 24.7 GB): Good balance of quality and size for most use cases.\n- **Q4_K_M** (21.2 GB): Popular sweet spot — significant size reduction with minimal quality loss.\n- **IQ4_NL / IQ4_XS** (19.8 / 18.7 GB): Importance-matrix 4-bit — can outperform standard Q4 quants at similar size.\n- **IQ3_M / IQ3_S** (15.4 / 15.3 GB): Importance-matrix 3-bit — good quality for the size with imatrix calibration.\n- **IQ2_M and below** (11.7 GB and smaller): Extreme compression with imatrix. Quality degrades progressively.\n- **IQ1_M / IQ1_S** (8.2 / 7.5 GB): Maximum compression. Expect significant quality loss.\n- **IQ3_M and below**: For constrained environments. Quality degrades progressively.\n- **IQ2 / IQ1**: Extreme compression. Expect notable quality degradation.\n\n## Usage\n\n### llama.cpp\n\n```bash\nllama-cli -m hermes-qwen3.5-35b-a3b-Q4_K_M.gguf -p \"You are a helpful assistant.\" -cnv\n```\n\n### LM Studio / Ollama / KoboldCpp\n\nDownload any GGUF file and load it directly.\n\n## Credits\n\n- **Base Model:** [Qwen Team](https://huggingface.co/Qwen)\n- **Training Dataset:** [NousResearch](https://huggingface.co/NousResearch)\n- **Fine-Tuning Framework:** [Unsloth](https://github.com/unslothai/unsloth)\n- **Quantization Tooling:** [llama.cpp](https://github.com/ggerganov/llama.cpp)\n",
    "related_quantizations": []
  },
  "tags": [
    "hermes",
    "gguf",
    "qwen3.5",
    "moe",
    "lora",
    "function-calling",
    "unsloth",
    "text-generation",
    "dataset:NousResearch/hermes-function-calling-v1",
    "base_model:Qwen/Qwen3.5-35B-A3B",
    "base_model:adapter:Qwen/Qwen3.5-35B-A3B",
    "doi:10.57967/hf/8262",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 4,
  "downloads": 20143,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-03T21:11:41.000Z",
  "created_at": "2026-03-31T18:40:44.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "hermes"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69cc152cd4ac07cad3ac11ec",
  "id": "DJLougen/hermes-qwen3.5-35b-a3b-GGUF",
  "modelId": "DJLougen/hermes-qwen3.5-35b-a3b-GGUF",
  "sha": "50034abc9358d4ab990736411e73d5c85323bb93",
  "createdAt": "2026-03-31T18:40:44.000Z",
  "lastModified": "2026-04-03T21:11:41.000Z",
  "author": "DJLougen",
  "downloads": 20143,
  "likes": 4,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "hermes",
  "siblings_count": 22
}

djlougen/hermes-qwen3.5-35b-a3b-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard