GraySoft
Projects Models About FAQ Contact Download guIDE β†’

akjindal53244/llama-3.1-storm-8b-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

akjindal53244/llama-3.1-storm-8b-gguf overview

This is the GGUF quantized version of Llama-3.1-Storm-8B, for use with llama.cpp. BF16 Model here

ggufllamallama-3.1conversationalinstruction followingreasoningfunction callingtext-generationendefritpthiestharxiv:2406.06623arxiv:2311.07911arxiv:2311.12022arxiv:2406.01574arxiv:1803.05457arxiv:2310.16049arxiv:2210.09261arxiv:2109.07958base_model:akjindal53244/Llama-3.1-Storm-8Bbase_model:quantized:akjindal53244/Llama-3.1-Storm-8Blicense:llama3.1endpoints_compatibleregion:us
akjindal53244/llama-3.1-storm-8b-gguf visual
Downloads
298
Likes
41
Pipeline
text-generation
Library
β€”
Visibility
Public
Access
Open

Repository Files & Downloads

4 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Llama-3.1-Storm-8B.Q4_K_M.gguf GGUF Q4_K_M 4.58 GB Download
Llama-3.1-Storm-8B.Q5_K_M.gguf GGUF Q5_K_M 5.34 GB Download
Llama-3.1-Storm-8B.Q6_K.gguf GGUF Q6_K 6.14 GB Download
Llama-3.1-Storm-8B.Q8_0.gguf GGUF β€” 7.95 GB Download

Benchmarks

Model Strength Relevant Benchmarks
🎯 Improved Instruction Following IFEval Strict (+3.93%)
🌐 Enhanced Knowledge Driven Question Answering GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
🧠 Better Reasoning ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
πŸ€– Superior Agentic Capabilities BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)
🚫 Reduced Hallucinations TruthfulQA (+9%)

Model Details Live

Model Slug
akjindal53244/llama-3.1-storm-8b-gguf
Author
akjindal53244
Pipeline Task
text-generation
Library
β€”
Created
2024-08-16
Last Modified
2024-08-21
Gated
No
Private
No
HF SHA
44f9bb97d7833db0a17e9596a22f1b301f3ad8a3
License
llama3.1
Language
en, de, fr, it, pt, hi, es, th
Base Model
akjindal53244/Llama-3.1-Storm-8B

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "language": [
      "en",
      "de",
      "fr",
      "it",
      "pt",
      "hi",
      "es",
      "th"
    ],
    "pipeline_tag": "text-generation",
    "tags": [
      "llama-3.1",
      "conversational",
      "instruction following",
      "reasoning",
      "function calling"
    ],
    "license": "llama3.1",
    "base_model": "akjindal53244/Llama-3.1-Storm-8B",
    "frontmatter": {
      "language": [
        "en",
        "de",
        "fr",
        "it",
        "pt",
        "hi",
        "es",
        "th"
      ],
      "pipeline_tag": "text-generation",
      "tags": [
        "llama-3.1",
        "conversational",
        "instruction following",
        "reasoning",
        "function calling"
      ],
      "license": "llama3.1",
      "base_model": "akjindal53244/Llama-3.1-Storm-8B"
    },
    "hero_image_url": "https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/tmOlbERGKP7JSODa6T06J.jpeg",
    "summary": "**This is the GGUF quantized version of Llama-3.1-Storm-8B, for use with llama.cpp. BF16 Model here**",
    "quick_links": [],
    "benchmark_table_html": "<table>\n  <tr>\n   <td><strong>Model Strength</strong>\n   </td>\n   <td><strong>Relevant Benchmarks</strong>\n   </td>\n  <tr>\n  <tr>\n   <td>🎯 Improved Instruction Following\n   </td>\n   <td>IFEval Strict (+3.93%)\n   </td>\n  <tr>\n  <tr>\n   <td>🌐 Enhanced Knowledge Driven Question Answering\n   </td>\n   <td>GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)\n   </td>\n  <tr>\n  <tr>\n   <td>🧠 Better Reasoning\n   </td>\n   <td>ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)\n   </td>\n  <tr>\n  <tr>\n   <td>πŸ€– Superior Agentic Capabilities\n   </td>\n   <td>BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)\n   </td>\n  <tr>\n  <tr>\n   <td>🚫 Reduced Hallucinations\n   </td>\n   <td>TruthfulQA (+9%)\n   </td>\n  <tr>\n</table>",
    "readme_markdown": "---\nlanguage:\n- en\n- de\n- fr\n- it\n- pt\n- hi\n- es\n- th\npipeline_tag: text-generation\ntags:\n- llama-3.1\n- conversational\n- instruction following\n- reasoning\n- function calling\nlicense: llama3.1\nbase_model: akjindal53244/Llama-3.1-Storm-8B\n---\n\n![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/tmOlbERGKP7JSODa6T06J.jpeg)\n\nAuthors: [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Pawan Kumar Rajpoot](https://www.linkedin.com/in/pawanrajpoot/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/), [Akshita Sukhlecha](https://www.linkedin.com/in/akshita-sukhlecha/)\n\n**πŸ€— Hugging Face Announcement Blog**: https://huggingface.co/blog/akjindal53244/llama31-storm8b\n\n**πŸš€Ollama:** `ollama run ajindal/llama3.1-storm:8b`\n\n<br>\n\n# Llama-3.1-Storm-8B-GGUF\n**This is the GGUF quantized version of [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B), for use with [llama.cpp](https://github.com/ggerganov/llama.cpp). BF16 Model [here](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)**\n\n## TL;DR\n![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/mDtDeiHwnBupw1k_n99Lf.png)\n\nWe present the [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model that outperforms Meta AI's [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps:\n1. **Self-Curation**: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples. **Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B).**\n2. **Targeted fine-tuning**: We performed [Spectrum](https://arxiv.org/abs/2406.06623)-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen.\n3. **Model Merging**: We merged our fine-tuned model with the [Llama-Spark](https://huggingface.co/arcee-ai/Llama-Spark) model using [SLERP](https://huggingface.co/blog/mlabonne/merge-models#1-slerp) method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling.\n\n## πŸ† Introducing Llama-3.1-Storm-8B\n[**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) builds upon the foundation of Llama-3.1-8B-Instruct, aiming to enhance both conversational and function calling capabilities within the 8B parameter model class.\n\nAs shown in the left subplot of the above figure, [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model improves Meta-Llama-3.1-8B-Instruct across various benchmarks - Instruction-following ([IFEval](https://arxiv.org/abs/2311.07911)), Knowledge-driven QA benchmarks ([GPQA](https://arxiv.org/abs/2311.12022), [MMLU-Pro](https://arxiv.org/pdf/2406.01574)), Reasoning ([ARC-C](https://arxiv.org/abs/1803.05457), [MuSR](https://arxiv.org/abs/2310.16049), [BBH](https://arxiv.org/pdf/2210.09261)), Reduced Hallucinations ([TruthfulQA](https://arxiv.org/abs/2109.07958)), and Function-Calling ([BFCL](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)). This improvement is particularly significant for AI developers and enthusiasts who work with limited computational resources.\n\nWe also benchmarked our model with the recently published model [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) built on top of the Llama-3.1-8B-Instruct model. As shown in the right subplot of the above figure, **Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B on 7 out of 9 benchmarks**, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark.\n\n\n## Llama-3.1-Storm-8B Model Strengths\nLlama-3.1-Storm-8B is a powerful generalist model useful for diverse applications. We invite the AI community to explore [Llama-3.1-Storm-8B](https://huggingface.co/collections/akjindal53244/storm-66ba6c96b7e24ecb592787a9) and look forward to seeing how it will be utilized in various projects and applications.\n\n<table>\n  <tr>\n   <td><strong>Model Strength</strong>\n   </td>\n   <td><strong>Relevant Benchmarks</strong>\n   </td>\n  <tr>\n  <tr>\n   <td>🎯 Improved Instruction Following\n   </td>\n   <td>IFEval Strict (+3.93%)\n   </td>\n  <tr>\n  <tr>\n   <td>🌐 Enhanced Knowledge Driven Question Answering\n   </td>\n   <td>GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)\n   </td>\n  <tr>\n  <tr>\n   <td>🧠 Better Reasoning\n   </td>\n   <td>ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)\n   </td>\n  <tr>\n  <tr>\n   <td>πŸ€– Superior Agentic Capabilities\n   </td>\n   <td>BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)\n   </td>\n  <tr>\n  <tr>\n   <td>🚫 Reduced Hallucinations\n   </td>\n   <td>TruthfulQA (+9%)\n   </td>\n  <tr>\n</table>\n\n**Note**: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct.\n\n\n## Llama-3.1-Storm-8B Models\n1. `BF16`: [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)\n2. ⚑ `FP8`: [Llama-3.1-Storm-8B-FP8-Dynamic](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic)\n3. ⚑ `GGUF`: [Llama-3.1-Storm-8B-GGUF](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-GGUF)\n4. πŸš€ Ollama: `ollama run ajindal/llama3.1-storm:8b`\n\n## πŸ’» How to Use GGUF Model\n\n```bash\npip install llama-cpp-python\n```\n\n```python\nfrom huggingface_hub import hf_hub_download\nfrom llama_cpp import Llama\n\n## Download the GGUF model\nmodel_name = \"akjindal53244/Llama-3.1-Storm-8B-GGUF\"\nmodel_file = \"Llama-3.1-Storm-8B.Q8_0.gguf\" # this is the specific model file we'll use in this example. It's a 4-bit quant, but other levels of quantization are available in the model repo if preferred\nmodel_path = hf_hub_download(model_name, filename=model_file)\n\n## Instantiate model from downloaded file\nllm = Llama(\n    model_path=model_path,\n    n_ctx=16000,    # Context length to use\n    n_threads=32,   # Number of CPU threads to use\n    n_gpu_layers=0  # Number of model layers to offload to GPU\n)\n\ngeneration_kwargs = {\n    \"max_tokens\":200,\n    \"stop\":[\"<|eot_id|>\"],\n    \"echo\":False, # Echo the prompt in the output\n    \"top_k\":1 # Set this value > 1 for sampling decoding\n}\n\nprompt = \"What is 2+2?\"\nres = llm(prompt, **generation_kwargs)\nprint(res[\"choices\"][0][\"text\"])\n```\n\n### Function Calling Example with [Ollama](https://ollama.com/)\n```\nimport ollama\ntools = [{\n      'type': 'function',\n      'function': {\n        'name': 'get_current_weather',\n        'description': 'Get the current weather for a city',\n        'parameters': {\n          'type': 'object',\n          'properties': {\n            'city': {\n              'type': 'string',\n              'description': 'The name of the city',\n            },\n          },\n          'required': ['city'],\n        },\n      },\n    },\n    {\n      'type': 'function',\n      'function': {\n        'name': 'get_places_to_vist',\n        'description': 'Get places to visit in a city',\n        'parameters': {\n          'type': 'object',\n          'properties': {\n            'city': {\n              'type': 'string',\n              'description': 'The name of the city',\n            },\n          },\n          'required': ['city'],\n        },\n      },\n    },\n  ]\nresponse = ollama.chat(\n    model='ajindal/llama3.1-storm:8b',\n    messages=[\n        {'role': 'system', 'content': 'Do not answer to nay vulgar questions.'},\n        {'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'}\n        ],\n    tools=tools\n)\nprint(response['message'])  # Expected Response: {'role': 'assistant', 'content': \"<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>\"}\n```\n\n\n## Alignment Note\nWhile **Llama-3.1-Storm-8B** did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model.\n\n## Cite Our Work\n```\n@misc {ashvini_kumar_jindal_2024,\n    author       = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },\n    title        = { Llama-3.1-Storm-8B },\n    year         = 2024,\n    url          = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },\n    doi          = { 10.57967/hf/2902 },\n    publisher    = { Hugging Face }\n}\n```\n\n## Support Our Work\nWith 3 team-members spanned across 3 different time-zones, we have won [NeurIPS LLM Efficiency Challenge 2023](https://llm-efficiency-challenge.github.io/) and 4 other competitions in Finance and Arabic LLM space. We have also published [SOTA mathematical reasoning model](https://huggingface.co/akjindal53244/Arithmo-Mistral-7B).\n\n**Llama-3.1-Storm-8B** is our most valuable contribution so far towards the open-source community. We are committed in developing efficient generalist LLMs. **We're seeking both computational resources and innovative collaborators to drive this initiative forward.**",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "llama",
    "llama-3.1",
    "conversational",
    "instruction following",
    "reasoning",
    "function calling",
    "text-generation",
    "en",
    "de",
    "fr",
    "it",
    "pt",
    "hi",
    "es",
    "th",
    "arxiv:2406.06623",
    "arxiv:2311.07911",
    "arxiv:2311.12022",
    "arxiv:2406.01574",
    "arxiv:1803.05457",
    "arxiv:2310.16049",
    "arxiv:2210.09261",
    "arxiv:2109.07958",
    "base_model:akjindal53244/Llama-3.1-Storm-8B",
    "base_model:quantized:akjindal53244/Llama-3.1-Storm-8B",
    "license:llama3.1",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 41,
  "downloads": 298,
  "gated": false,
  "private": false,
  "last_modified": "2024-08-21T02:31:32.000Z",
  "created_at": "2024-08-16T03:12:29.000Z",
  "pipeline_tag": "text-generation",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "66bec39d8e95eabff21d36a7",
  "id": "akjindal53244/Llama-3.1-Storm-8B-GGUF",
  "modelId": "akjindal53244/Llama-3.1-Storm-8B-GGUF",
  "sha": "44f9bb97d7833db0a17e9596a22f1b301f3ad8a3",
  "createdAt": "2024-08-16T03:12:29.000Z",
  "lastModified": "2024-08-21T02:31:32.000Z",
  "author": "akjindal53244",
  "downloads": 298,
  "likes": 41,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "",
  "siblings_count": 7
}