Model Intelligence Sheet

neopolita/llama-3.1-storm-8b-gguf overview

GGUF quants for akjindal53244/Llama-3.1-Storm-8B using llama.cpp Terms of Use: Please check the original model

ggufendpoints_compatibleregion:usconversational

Downloads

243

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

7 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
ggml-model-f16.gguf	GGUF	F16	14.97 GB	Download
llama-3.1-storm-8b_q2_k.gguf	GGUF	Q2_K	2.96 GB	Download
llama-3.1-storm-8b_q3_k_m.gguf	GGUF	Q3_K_M	3.74 GB	Download
llama-3.1-storm-8b_q4_k_m.gguf	GGUF	Q4_K_M	4.58 GB	Download
llama-3.1-storm-8b_q5_k_m.gguf	GGUF	Q5_K_M	5.34 GB	Download
llama-3.1-storm-8b_q6_k.gguf	GGUF	Q6_K	6.14 GB	Download
llama-3.1-storm-8b_q8_0.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

neopolita/llama-3.1-storm-8b-gguf

Author

neopolita

Pipeline Task

—

Library

—

Created

2024-08-21

Last Modified

2024-08-21

Gated

Private

HF SHA

c6883ab731b67e360128326a13d543db0e86f008

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "https://huggingface.co/neopolita/common/resolve/main/profile.png",
    "summary": "# GGUF quants for **akjindal53244/Llama-3.1-Storm-8B** using llama.cpp **Terms of Use**: Please check the **original model**",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\n{}\n---\n# GGUF quants for [**akjindal53244/Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) using [llama.cpp](https://github.com/ggerganov/llama.cpp)\n\n**Terms of Use**: Please check the [**original model**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)\n\n<picture>\n<img alt=\"cthulhu\" src=\"https://huggingface.co/neopolita/common/resolve/main/profile.png\">\n</picture>\n\n## Quants\n\n* `q2_k`: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.\n* `q3_k_s`: Uses Q3_K for all tensors\n* `q3_k_m`: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K\n* `q3_k_l`: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K\n* `q4_0`: Original quant method, 4-bit.\n* `q4_1`: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.\n* `q4_k_s`: Uses Q4_K for all tensors\n* `q4_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K\n* `q5_0`: Higher accuracy, higher resource usage and slower inference.\n* `q5_1`: Even higher accuracy, resource usage and slower inference.\n* `q5_k_s`:  Uses Q5_K for all tensors\n* `q5_k_m`: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K\n* `q6_k`: Uses Q8_K for all tensors\n* `q8_0`: Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 243,
  "gated": false,
  "private": false,
  "last_modified": "2024-08-21T22:35:07.000Z",
  "created_at": "2024-08-21T21:54:54.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "66c6622ee2fdf1d811bb07c3",
  "id": "neopolita/llama-3.1-storm-8b-gguf",
  "modelId": "neopolita/llama-3.1-storm-8b-gguf",
  "sha": "c6883ab731b67e360128326a13d543db0e86f008",
  "createdAt": "2024-08-21T21:54:54.000Z",
  "lastModified": "2024-08-21T22:35:07.000Z",
  "author": "neopolita",
  "downloads": 243,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 9
}