Model Intelligence Sheet
neopolita/llama-3.1-storm-8b-gguf overview
GGUF quants for akjindal53244/Llama-3.1-Storm-8B using llama.cpp Terms of Use: Please check the original model
Downloads
243
Likes
0
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
7 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| ggml-model-f16.gguf | GGUF | F16 | 14.97 GB | Download |
| llama-3.1-storm-8b_q2_k.gguf | GGUF | Q2_K | 2.96 GB | Download |
| llama-3.1-storm-8b_q3_k_m.gguf | GGUF | Q3_K_M | 3.74 GB | Download |
| llama-3.1-storm-8b_q4_k_m.gguf | GGUF | Q4_K_M | 4.58 GB | Download |
| llama-3.1-storm-8b_q5_k_m.gguf | GGUF | Q5_K_M | 5.34 GB | Download |
| llama-3.1-storm-8b_q6_k.gguf | GGUF | Q6_K | 6.14 GB | Download |
| llama-3.1-storm-8b_q8_0.gguf | GGUF | — | 7.95 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"frontmatter": {},
"hero_image_url": "https://huggingface.co/neopolita/common/resolve/main/profile.png",
"summary": "# GGUF quants for **akjindal53244/Llama-3.1-Storm-8B** using llama.cpp **Terms of Use**: Please check the **original model**",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\n{}\n---\n# GGUF quants for [**akjindal53244/Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) using [llama.cpp](https://github.com/ggerganov/llama.cpp)\n\n**Terms of Use**: Please check the [**original model**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)\n\n<picture>\n<img alt=\"cthulhu\" src=\"https://huggingface.co/neopolita/common/resolve/main/profile.png\">\n</picture>\n\n## Quants\n\n* `q2_k`: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.\n* `q3_k_s`: Uses Q3_K for all tensors\n* `q3_k_m`: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K\n* `q3_k_l`: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K\n* `q4_0`: Original quant method, 4-bit.\n* `q4_1`: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.\n* `q4_k_s`: Uses Q4_K for all tensors\n* `q4_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K\n* `q5_0`: Higher accuracy, higher resource usage and slower inference.\n* `q5_1`: Even higher accuracy, resource usage and slower inference.\n* `q5_k_s`: Uses Q5_K for all tensors\n* `q5_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K\n* `q6_k`: Uses Q8_K for all tensors\n* `q8_0`: Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.",
"related_quantizations": []
},
"tags": [
"gguf",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 243,
"gated": false,
"private": false,
"last_modified": "2024-08-21T22:35:07.000Z",
"created_at": "2024-08-21T21:54:54.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "66c6622ee2fdf1d811bb07c3",
"id": "neopolita/llama-3.1-storm-8b-gguf",
"modelId": "neopolita/llama-3.1-storm-8b-gguf",
"sha": "c6883ab731b67e360128326a13d543db0e86f008",
"createdAt": "2024-08-21T21:54:54.000Z",
"lastModified": "2024-08-21T22:35:07.000Z",
"author": "neopolita",
"downloads": 243,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 9
}