joseph717171/hermes-3-llama-3.1-8b-oq8_0-f32.ef32.iq4_k-q8_0-gguf - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
joseph717171/hermes-3-llama-3.1-8b-oq8_0-f32.ef32.iq4_k-q8_0-gguf overview
Custom GGUF quants of NousResearch/Hermes-3-Llama-3.1-8B, where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! ๐ง ๐ฅ๐ Update: This repo now contains OF32.EF32 GGUF IQuants for even more accuracy. Enjoy! ๐ UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4658 (855cd073). This particular version of llama.cpp added support for Non-Contiguous RMS Norms. This has enhanced model coherence and further increased model creativity (from testing).
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Hermes-3-Llama-3.1-8B-OF32.EF32.IQ4_K.gguf | GGUF | F32 | 7.82 GB | Download |
| Hermes-3-Llama-3.1-8B-OF32.EF32.IQ6_K.gguf | GGUF | F32 | 9.25 GB | Download |
| Hermes-3-Llama-3.1-8B-OF32.EF32.IQ8_0.gguf | GGUF | F32 | 10.83 GB | Download |
| Hermes-3-Llama-3.1-8B-OQ8_0.EF32.IQ4_K.gguf | GGUF | F32 | 6.38 GB | Download |
| Hermes-3-Llama-3.1-8B-OQ8_0.EF32.IQ6_K.gguf | GGUF | F32 | 7.82 GB | Download |
| Hermes-3-Llama-3.1-8B-OQ8_0.EF32.IQ8_0.gguf | GGUF | F32 | 9.39 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"frontmatter": {},
"hero_image_url": "",
"summary": "Custom GGUF quants of NousResearch/Hermes-3-Llama-3.1-8B, where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! ๐ง ๐ฅ๐ Update: This repo now contains OF32.EF32 GGUF IQuants for even more accuracy. Enjoy! ๐ UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4658 (855cd073). This particular version of llama.cpp added support for Non-Contiguous RMS Norms. This has enhanced model coherence and further increased model creativity (from testing).",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "Custom GGUF quants of [NousResearch/Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B), where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! ๐ง ๐ฅ๐ \n\nUpdate: This repo now contains OF32.EF32 GGUF IQuants for even more accuracy. Enjoy! ๐ \n\nUPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4658 (855cd073). This particular version of llama.cpp added support for Non-Contiguous RMS Norms. This has enhanced model coherence and further increased model creativity (from testing). ",
"related_quantizations": []
},
"tags": [
"gguf",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 2,
"downloads": 525,
"gated": false,
"private": false,
"last_modified": "2025-02-07T21:09:29.000Z",
"created_at": "2024-09-10T03:20:27.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "66dfbafbcfbb7620ad431166",
"id": "Joseph717171/Hermes-3-Llama-3.1-8B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF",
"modelId": "Joseph717171/Hermes-3-Llama-3.1-8B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF",
"sha": "ee9a704cf0939ed1c4145489ee6063bf3671ebf2",
"createdAt": "2024-09-10T03:20:27.000Z",
"lastModified": "2025-02-07T21:09:29.000Z",
"author": "Joseph717171",
"downloads": 525,
"likes": 2,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 9
}