djlougen/hermes-qwen3.5-35b-a3b-gguf F16 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
djlougen/hermes-qwen3.5-35b-a3b-gguf overview
GGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on NousResearch/hermes-function-calling-v1 for structured function calling and tool use.
Downloads
20,143
Likes
4
Pipeline
text-generation
Library
hermes
Visibility
Public
Access
Open
Repository Files & Downloads
19 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| hermes-qwen3.5-35b-a3b-IQ1_M.gguf | GGUF | IQ1_M | 7.67 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ1_S.gguf | GGUF | IQ1_S | 6.97 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ2_M.gguf | GGUF | IQ2_M | 10.86 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ2_S.gguf | GGUF | IQ2_S | 9.92 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ2_XXS.gguf | GGUF | IQ2_XXS | 8.85 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ3_M.gguf | GGUF | IQ3_M | 14.38 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ3_S.gguf | GGUF | IQ3_S | 14.20 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ3_XXS.gguf | GGUF | IQ3_XXS | 3.29 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ4_NL.gguf | GGUF | IQ4_NL | 18.42 GB | Download |
| hermes-qwen3.5-35b-a3b-IQ4_XS.gguf | GGUF | IQ4_XS | 17.44 GB | Download |
| hermes-qwen3.5-35b-a3b-Q3_K_M.gguf | GGUF | Q3_K_M | 15.61 GB | Download |
| hermes-qwen3.5-35b-a3b-Q3_K_S.gguf | GGUF | Q3_K_S | 14.14 GB | Download |
| hermes-qwen3.5-35b-a3b-Q4_K_M.gguf | GGUF | Q4_K_M | 19.71 GB | Download |
| hermes-qwen3.5-35b-a3b-Q4_K_S.gguf | GGUF | Q4_K_S | 18.52 GB | Download |
| hermes-qwen3.5-35b-a3b-Q5_K_M.gguf | GGUF | Q5_K_M | 23.03 GB | Download |
| hermes-qwen3.5-35b-a3b-Q5_K_S.gguf | GGUF | Q5_K_S | 22.33 GB | Download |
| hermes-qwen3.5-35b-a3b-Q6_K.gguf | GGUF | Q6_K | 26.56 GB | Download |
| hermes-qwen3.5-35b-a3b-Q8_0.gguf | GGUF | — | 34.37 GB | Download |
| hermes-qwen3.5-35b-a3b-f16.gguf | GGUF | F16 | 64.61 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"base_model": "Qwen/Qwen3.5-35B-A3B",
"tags": [
"qwen3.5",
"moe",
"gguf",
"lora",
"function-calling",
"hermes",
"unsloth"
],
"license": "apache-2.0",
"datasets": [
"NousResearch/hermes-function-calling-v1"
],
"pipeline_tag": "text-generation",
"model-index": [
{
"name": "hermes-qwen3.5-35b-a3b-GGUF",
"results": []
}
],
"frontmatter": {
"base_model": "Qwen/Qwen3.5-35B-A3B",
"tags": [
"qwen3.5",
"moe",
"gguf",
"lora",
"function-calling",
"hermes",
"unsloth"
],
"license": "apache-2.0",
"datasets": [
"NousResearch/hermes-function-calling-v1"
],
"pipeline_tag": [
"name: hermes-qwen3.5-35b-a3b-GGUF"
]
},
"hero_image_url": "",
"summary": "GGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on NousResearch/hermes-function-calling-v1 for structured function calling and tool use.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nbase_model: Qwen/Qwen3.5-35B-A3B\ntags:\n- qwen3.5\n- moe\n- gguf\n- lora\n- function-calling\n- hermes\n- unsloth\nlicense: apache-2.0\ndatasets:\n- NousResearch/hermes-function-calling-v1\npipeline_tag: text-generation\nmodel-index:\n- name: hermes-qwen3.5-35b-a3b-GGUF\n results: []\n---\n\nNote: These models are optimized for use within an agentic harness (e.g. Hermes Agent) and may behave unexpectedly in raw inference without a system prompt. Capability benchmarks are strong but conversational behavior outside of a structured harness is not reliable. I am currently working on v2 to address this and reduce harness dependency.\n<div align=\"center\">\n<h3>Support This Work</h3>\n<p>\nI'm a PhD student who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand.\n<br><br>\nIf my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.\n</p>\n<p><a href=\"https://ko-fi.com/djlougen\">☕ ko-fi.com/djlougen</a></p>\n</div>\n\n---\n\n# Hermes Qwen3.5 35B-A3B GGUF\n\nGGUF quantizations of a Qwen3.5-35B-A3B model fine-tuned on [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) for structured function calling and tool use.\n\n## Base Model\n\n- **Architecture:** Qwen3.5 MoE (Mixture of Experts) — 35B total parameters, ~3B active per token\n- **Base:** [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B)\n- **Context Length:** 262,144 tokens\n- **Experts:** 256 total, 8 active per token\n\n## Fine-Tuning Details\n\n- **Method:** LoRA via [Unsloth](https://github.com/unslothai/unsloth) + TRL SFTTrainer\n- **LoRA Rank (r):** 32\n- **LoRA Alpha:** 32\n- **LoRA Dropout:** 0\n- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`\n- **Training Precision:** bf16\n- **Optimizer:** AdamW 8-bit\n- **Learning Rate:** 2e-4 with cosine scheduler\n- **Warmup Steps:** 10\n- **Epochs:** 3\n- **Batch Size:** 2 per device, 8 gradient accumulation steps (effective batch size 16)\n- **Max Sequence Length:** 4,096 tokens\n- **Weight Decay:** 0.01\n- **PEFT Version:** 0.18.1\n\n### Training Dataset\n\n[NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) — a function-calling dataset following the Hermes Function-calling Standard. Includes:\n\n- Cleaned Glaive Function Calling samples\n- Advanced JSON structured output (agentic, multi-turn)\n- Single-turn JSON structured output samples\n\nConversations were formatted using ChatML (`<|im_start|>` / `<|im_end|>`) with role mapping: `system`, `human` -> `user`, `gpt` -> `assistant`, `tool`.\n\n## Quantization\n\nAll quantizations were produced using [llama.cpp](https://github.com/ggerganov/llama.cpp) with an **importance matrix** (imatrix) computed from WikiText-2 calibration data for improved quality at lower bit depths.\n\n### Available Quants\n\n| Filename | Quant | Type | Size |\n|----------|-------|------|------|\n| hermes-qwen3.5-35b-a3b-f16.gguf | F16 | Full precision | 64.6 GB |\n| hermes-qwen3.5-35b-a3b-Q8_0.gguf | Q8_0 | Standard | 36.9 GB |\n| hermes-qwen3.5-35b-a3b-Q6_K.gguf | Q6_K | K-quant | 28.5 GB |\n| hermes-qwen3.5-35b-a3b-Q5_K_M.gguf | Q5_K_M | K-quant | 24.7 GB |\n| hermes-qwen3.5-35b-a3b-Q5_K_S.gguf | Q5_K_S | K-quant | 24.0 GB |\n| hermes-qwen3.5-35b-a3b-Q4_K_M.gguf | Q4_K_M | K-quant | 21.2 GB |\n| hermes-qwen3.5-35b-a3b-Q4_K_S.gguf | Q4_K_S | K-quant | 19.9 GB |\n| hermes-qwen3.5-35b-a3b-IQ4_NL.gguf | IQ4_NL | imatrix | 19.8 GB |\n| hermes-qwen3.5-35b-a3b-IQ4_XS.gguf | IQ4_XS | imatrix | 18.7 GB |\n| hermes-qwen3.5-35b-a3b-Q3_K_M.gguf | Q3_K_M | K-quant | 16.8 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_M.gguf | IQ3_M | imatrix | 15.4 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_S.gguf | IQ3_S | imatrix | 15.3 GB |\n| hermes-qwen3.5-35b-a3b-Q3_K_S.gguf | Q3_K_S | K-quant | 15.2 GB |\n| hermes-qwen3.5-35b-a3b-IQ3_XXS.gguf | IQ3_XXS | imatrix | 13.6 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_M.gguf | IQ2_M | imatrix | 11.7 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_S.gguf | IQ2_S | imatrix | 10.7 GB |\n| hermes-qwen3.5-35b-a3b-IQ2_XXS.gguf | IQ2_XXS | imatrix | 9.5 GB |\n| hermes-qwen3.5-35b-a3b-IQ1_M.gguf | IQ1_M | imatrix | 8.2 GB |\n| hermes-qwen3.5-35b-a3b-IQ1_S.gguf | IQ1_S | imatrix | 7.5 GB |\n\nAll quantizations verified: 733 tensors, GGUF v3.\n\n### Choosing a Quant\n\n- **Q8_0** (36.9 GB): Closest to full precision. Use if you have the VRAM/RAM.\n- **Q6_K / Q5_K_M** (28.5 / 24.7 GB): Good balance of quality and size for most use cases.\n- **Q4_K_M** (21.2 GB): Popular sweet spot — significant size reduction with minimal quality loss.\n- **IQ4_NL / IQ4_XS** (19.8 / 18.7 GB): Importance-matrix 4-bit — can outperform standard Q4 quants at similar size.\n- **IQ3_M / IQ3_S** (15.4 / 15.3 GB): Importance-matrix 3-bit — good quality for the size with imatrix calibration.\n- **IQ2_M and below** (11.7 GB and smaller): Extreme compression with imatrix. Quality degrades progressively.\n- **IQ1_M / IQ1_S** (8.2 / 7.5 GB): Maximum compression. Expect significant quality loss.\n- **IQ3_M and below**: For constrained environments. Quality degrades progressively.\n- **IQ2 / IQ1**: Extreme compression. Expect notable quality degradation.\n\n## Usage\n\n### llama.cpp\n\n```bash\nllama-cli -m hermes-qwen3.5-35b-a3b-Q4_K_M.gguf -p \"You are a helpful assistant.\" -cnv\n```\n\n### LM Studio / Ollama / KoboldCpp\n\nDownload any GGUF file and load it directly.\n\n## Credits\n\n- **Base Model:** [Qwen Team](https://huggingface.co/Qwen)\n- **Training Dataset:** [NousResearch](https://huggingface.co/NousResearch)\n- **Fine-Tuning Framework:** [Unsloth](https://github.com/unslothai/unsloth)\n- **Quantization Tooling:** [llama.cpp](https://github.com/ggerganov/llama.cpp)\n",
"related_quantizations": []
},
"tags": [
"hermes",
"gguf",
"qwen3.5",
"moe",
"lora",
"function-calling",
"unsloth",
"text-generation",
"dataset:NousResearch/hermes-function-calling-v1",
"base_model:Qwen/Qwen3.5-35B-A3B",
"base_model:adapter:Qwen/Qwen3.5-35B-A3B",
"doi:10.57967/hf/8262",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 4,
"downloads": 20143,
"gated": false,
"private": false,
"last_modified": "2026-04-03T21:11:41.000Z",
"created_at": "2026-03-31T18:40:44.000Z",
"pipeline_tag": "text-generation",
"library_name": "hermes"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69cc152cd4ac07cad3ac11ec",
"id": "DJLougen/hermes-qwen3.5-35b-a3b-GGUF",
"modelId": "DJLougen/hermes-qwen3.5-35b-a3b-GGUF",
"sha": "50034abc9358d4ab990736411e73d5c85323bb93",
"createdAt": "2026-03-31T18:40:44.000Z",
"lastModified": "2026-04-03T21:11:41.000Z",
"author": "DJLougen",
"downloads": 20143,
"likes": 4,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "hermes",
"siblings_count": 22
}