inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf IQ3_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf overview

Special request. Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead. The .imatrix file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF. Calculated in 88 chunks with nctx=512 using groupsmerged.txt For a brief rundown of iMatrix quant performance please see this PR All quants are verified working prior to uploading to repo for your safety and convenience. Tip: Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well. BFloat16 model card can be found here

ggufmergellama3iMatendpoints_compatibleregion:usimatrixconversational

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf visual

Downloads

241

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

20 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Llama-3-70B-Instruct-Storywriter-iMat-IQ1_M.gguf	GGUF	IQ1_M	15.60 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_M.gguf	GGUF	IQ2_M	22.46 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_S.gguf	GGUF	IQ2_S	20.71 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_XS.gguf	GGUF	IQ2_XS	19.69 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ2_XXS.gguf	GGUF	IQ2_XXS	17.79 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_M.gguf	GGUF	IQ3_M	29.74 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_S.gguf	GGUF	IQ3_S	28.79 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_XS.gguf	GGUF	IQ3_XS	27.29 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ3_XXS.gguf	GGUF	IQ3_XXS	25.58 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-IQ4_XS.gguf	GGUF	IQ4_XS	35.30 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q2_K.gguf	GGUF	Q2_K	24.56 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q3_K_M.gguf	GGUF	Q3_K_M	31.91 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q4_K_M.gguf	GGUF	Q4_K_M	39.60 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q4_K_S.gguf	GGUF	Q4_K_S	37.58 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q5_K_M.gguf	GGUF	Q5_K_M	46.52 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q5_K_S.gguf	GGUF	Q5_K_S	45.32 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q6_K-00001-of-00002.gguf	GGUF	Q6_K	43.95 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q6_K-00002-of-00002.gguf	GGUF	Q6_K	9.96 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q8_0-00001-of-00002.gguf	GGUF	—	43.85 GB	Download
Llama-3-70B-Instruct-Storywriter-iMat-Q8_0-00002-of-00002.gguf	GGUF	—	25.98 GB	Download

Model Details Live

Model Slug

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf

Author

InferenceIllusionist

Pipeline Task

—

Library

—

Created

2024-05-01

Last Modified

2024-05-11

Gated

Private

HF SHA

1b17f5064cd2d36ec09a5f6f1bf88ff75e4640cc

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "tags": [
      "merge",
      "gguf",
      "llama3",
      "iMat"
    ],
    "frontmatter": {
      "tags": [
        "merge",
        "gguf",
        "llama3",
        "iMat"
      ]
    },
    "hero_image_url": "https://i.imgur.com/P68dXux.png",
    "summary": "Special request. Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead. * The .imatrix file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF. * Calculated in 88 chunks with n_ctx=512 using groups_merged.txt For a brief rundown of iMatrix quant performance please see this PR All quants are verified working prior to uploading to repo for your safety and convenience.  Tip: Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well. BFloat16 model card can be found here",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\ntags:\n- merge\n- gguf\n- llama3\n- iMat\n---\n<img src=\"https://i.imgur.com/P68dXux.png\" width=\"400\"/>\n\n# Llama-3-70B-Instruct-Storywriter-iMat-GGUF\n\n\n<b>Special request.</b> Quantized from fp32 with love. If you can't fit IQ quants in your VRAM, try using the K quants in this repo instead.\n* The [.imatrix](https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF/resolve/main/Llama-3-70B-Instruct-Storywriter.imatrix?download=true) file in this repo was created using the Q8_0 quantization of Llama-3-70B-Instruct-Storywriter-iMat-GGUF.\n* Calculated in 88 chunks with n_ctx=512 using groups_merged.txt\n\nFor a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)\n\n<i>All quants are verified working prior to uploading to repo for your safety and convenience. </i>\n\n\n<b>Tip:</b> Pick a file size under your GPU's VRAM while still allowing some room for context for best speed. You may need to pad this further depending on if you are running image gen or TTS as well.\n\nBFloat16 model card can be found [here](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter)",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "merge",
    "llama3",
    "iMat",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 0,
  "downloads": 241,
  "gated": false,
  "private": false,
  "last_modified": "2024-05-11T19:25:28.000Z",
  "created_at": "2024-05-01T08:42:19.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6632006b1fd3ec937aff5a80",
  "id": "InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF",
  "modelId": "InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF",
  "sha": "1b17f5064cd2d36ec09a5f6f1bf88ff75e4640cc",
  "createdAt": "2024-05-01T08:42:19.000Z",
  "lastModified": "2024-05-11T19:25:28.000Z",
  "author": "InferenceIllusionist",
  "downloads": 241,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 23
}

inferenceillusionist/llama-3-70b-instruct-storywriter-imat-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard