GraySoft
Projects Models About FAQ Contact Download guIDE →

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf IQ3_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf overview

Special request. Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead. The .imatrix file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF. Calculated in 88 chunks with nctx=512 using groupsmerged.txt For a brief rundown of iMatrix quant performance please see this PR All quants are verified working prior to uploading to repo for your safety and convenience. Tip: Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well. BFloat16 model card can be found here

ggufmergellama3iMatendpoints_compatibleregion:usimatrixconversational
inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf visual
Downloads
241
Likes
0
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

20 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Llama-3-70B-Instruct-Storywriter-iMat-IQ1_M.gguf GGUF IQ1_M 15.60 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_M.gguf GGUF IQ2_M 22.46 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_S.gguf GGUF IQ2_S 20.71 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_XS.gguf GGUF IQ2_XS 19.69 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_XXS.gguf GGUF IQ2_XXS 17.79 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_M.gguf GGUF IQ3_M 29.74 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_S.gguf GGUF IQ3_S 28.79 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_XS.gguf GGUF IQ3_XS 27.29 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_XXS.gguf GGUF IQ3_XXS 25.58 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ4_XS.gguf GGUF IQ4_XS 35.30 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q2_K.gguf GGUF Q2_K 24.56 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q3_K_M.gguf GGUF Q3_K_M 31.91 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q4_K_M.gguf GGUF Q4_K_M 39.60 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q4_K_S.gguf GGUF Q4_K_S 37.58 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q5_K_M.gguf GGUF Q5_K_M 46.52 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q5_K_S.gguf GGUF Q5_K_S 45.32 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q6_K-00001-of-00002.gguf GGUF Q6_K 43.95 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q6_K-00002-of-00002.gguf GGUF Q6_K 9.96 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q8_0-00001-of-00002.gguf GGUF 43.85 GB Download
Llama-3-70B-Instruct-Storywriter-iMat-Q8_0-00002-of-00002.gguf GGUF 25.98 GB Download

Model Details Live

Model Slug
inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf
Author
InferenceIllusionist
Pipeline Task
Library
Created
2024-05-01
Last Modified
2024-05-11
Gated
No
Private
No
HF SHA
1b17f5064cd2d36ec09a5f6f1bf88ff75e4640cc
License
Unknown
Language
Unknown
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "tags": [
      "merge",
      "gguf",
      "llama3",
      "iMat"
    ],
    "frontmatter": {
      "tags": [
        "merge",
        "gguf",
        "llama3",
        "iMat"
      ]
    },
    "hero_image_url": "https://i.imgur.com/P68dXux.png",
    "summary": "Special request. Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead. * The .imatrix file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF. * Calculated in 88 chunks with n_ctx=512 using groups_merged.txt For a brief rundown of iMatrix quant performance please see this PR All quants are verified working prior to uploading to repo for your safety and convenience.  Tip: Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well. BFloat16 model card can be found here",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\ntags:\n- merge\n- gguf\n- llama3\n- iMat\n---\n<img src=\"https://i.imgur.com/P68dXux.png\" width=\"400\"/>\n\n# Llama-3-70B-Instruct-Storywriter-iMat-GGUF\n\n\n<b>Special request.</b> Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead.\n* The [.imatrix](https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF/resolve/main/Llama-3-70B-Instruct-Storywriter.imatrix?download=true) file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF.\n* Calculated in 88 chunks with n_ctx=512 using groups_merged.txt\n\nFor a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)\n\n<i>All quants are verified working prior to uploading to repo for your safety and convenience. </i>\n\n\n<b>Tip:</b> Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well.\n\nBFloat16 model card can be found [here](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter)",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "merge",
    "llama3",
    "iMat",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 0,
  "downloads": 241,
  "gated": false,
  "private": false,
  "last_modified": "2024-05-11T19:25:28.000Z",
  "created_at": "2024-05-01T08:42:19.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "6632006b1fd3ec937aff5a80",
  "id": "InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF",
  "modelId": "InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF",
  "sha": "1b17f5064cd2d36ec09a5f6f1bf88ff75e4640cc",
  "createdAt": "2024-05-01T08:42:19.000Z",
  "lastModified": "2024-05-11T19:25:28.000Z",
  "author": "InferenceIllusionist",
  "downloads": 241,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 23
}