lewdiculous/llama-3.1-8b-stheno-v3.4-gguf-iq-imatrix IQ3_XXS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

lewdiculous/llama-3.1-8b-stheno-v3.4-gguf-iq-imatrix overview

Quants for Sao10K/Llama-3.1-8B-Stheno-v3.4. Q40 ARM/Mobile quants here: Llama-3.1-8B-Stheno-v3.4-GGUF-ARM-Imatrix-Supplementary. I recommend checking their page for feedback and support. Quantization process: Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. This hopefully avoids losses during conversion. To run this model, please use the latest version of KoboldCpp. If you noticed any issues let me know in the discussions. Presets: Some compatible SillyTavern presets can be found here (Virt's Roleplay Presets - v1.9). Check discussions such as this one and this one for other presets and samplers recommendations. Lower temperatures are recommended by the authors, so make sure to experiment. General usage with KoboldCpp: For 8GB VRAM GPUs, I recommend the Q4K_M-imat (4.89 BPW) quant for up to 12288 context sizes without the use of --quantkv. Using --quantkv 1 (≈Q8) or even --quantkv 2 (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context. Read more about it in the release here. !image/png Click here for the original model card information. !img --- Thanks to Backyard.ai for the compute to train this. :) --- Llama-3.1-8B-Stheno-v3.4 This model has went through a multi-stage finetuning process. Prompting Format: Changes since previous Stheno Datasets: Personal Opinions: Below are some graphs and all for you to observe. --- Turn Distribution # 1 Turn is considered as 1 combined Human/GPT pair in a ShareGPT format. 4 Turns means 1 System Row + 8 Human/GPT rows in total. !Turn Token Count Histogram # Based on the Llama 3 Tokenizer !Turn --- Have a good one. Source Image: https://www.pixiv.net/en/artworks/91689070

transformersggufroleplayllama3sillytavernenbase_model:Sao10K/Llama-3.1-8B-Stheno-v3.4base_model:quantized:Sao10K/Llama-3.1-8B-Stheno-v3.4license:cc-by-nc-4.0region:usconversational

lewdiculous/llama-3.1-8b-stheno-v3.4-gguf-iq-imatrix visual

Downloads

3,045

Likes

Pipeline

—

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

11 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Llama-3.1-8B-Stheno-v3.4-BF16.gguf	GGUF	BF16	14.97 GB	Download
Llama-3.1-8B-Stheno-v3.4-F16.gguf	GGUF	F16	14.97 GB	Download
Llama-3.1-8B-Stheno-v3.4-IQ3_M-imat.gguf	GGUF	IQ3_M	3.52 GB	Download
Llama-3.1-8B-Stheno-v3.4-IQ3_XXS-imat.gguf	GGUF	IQ3_XXS	3.05 GB	Download
Llama-3.1-8B-Stheno-v3.4-IQ4_XS-imat.gguf	GGUF	IQ4_XS	4.14 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q4_K_M-imat.gguf	GGUF	Q4_K_M	4.58 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q4_K_S-imat.gguf	GGUF	Q4_K_S	4.37 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q5_K_M-imat.gguf	GGUF	Q5_K_M	5.34 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q5_K_S-imat.gguf	GGUF	Q5_K_S	5.21 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q6_K-imat.gguf	GGUF	Q6_K	6.14 GB	Download
Llama-3.1-8B-Stheno-v3.4-Q8_0-imat.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

lewdiculous/llama-3.1-8b-stheno-v3.4-gguf-iq-imatrix

Author

Lewdiculous

Pipeline Task

—

Library

transformers

Created

2024-08-21

Last Modified

2026-01-25

Gated

Private

HF SHA

ae530799fce8bcbaf68e25f4cbf572192ca428ca

License

cc-by-nc-4.0

Language

Base Model

Sao10K/Llama-3.1-8B-Stheno-v3.4

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": "Sao10K/Llama-3.1-8B-Stheno-v3.4",
    "quantized_by": "Lewdiculous",
    "library_name": "transformers",
    "license": "cc-by-nc-4.0",
    "inference": false,
    "language": [
      "en"
    ],
    "tags": [
      "roleplay",
      "llama3",
      "sillytavern"
    ],
    "frontmatter": {
      "base_model": "Sao10K/Llama-3.1-8B-Stheno-v3.4",
      "quantized_by": "Lewdiculous",
      "library_name": "transformers",
      "license": "cc-by-nc-4.0",
      "inference": "false",
      "language": [
        "en"
      ],
      "tags": [
        "roleplay",
        "llama3",
        "sillytavern"
      ]
    },
    "hero_image_url": "https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/GV63jjNPXvSG-BSOGuP5h.png",
    "summary": "Quants for **Sao10K/Llama-3.1-8B-Stheno-v3.4**. > [!NOTE] > **Q4_0 ARM/Mobile** quants here: Llama-3.1-8B-Stheno-v3.4-GGUF-ARM-Imatrix-Supplementary. I recommend checking their page for feedback and support. > [!IMPORTANT] > **Quantization process:**  > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF.  > This hopefully avoids losses during conversion.  > To run this model, please use the **latest version of KoboldCpp**.  > If you noticed any issues let me know in the discussions. > [!NOTE] > **Presets:**  > Some compatible SillyTavern presets can be found **here (Virt's Roleplay Presets - v1.9)**.  > Check **discussions such as this one** and **this one** for other presets and samplers recommendations.  > Lower temperatures are recommended by the authors, so make sure to experiment.  > > **General usage with KoboldCpp:**  > For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes without the use of --quantkv.  > Using --quantkv 1 (≈Q8) or even --quantkv 2 (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context.  > **Read more about it in the release here**. !image/png  Click here for the original model card information. !img --- Thanks to Backyard.ai for the compute to train this. :) --- Llama-3.1-8B-Stheno-v3.4 This model has went through a multi-stage finetuning process. `` ` Prompting Format: ` ` Changes since previous Stheno Datasets: ` ` Personal Opinions: ` ` Below are some graphs and all for you to observe. --- Turn Distribution # 1 Turn is considered as 1 combined Human/GPT pair in a ShareGPT format. 4 Turns means 1 System Row + 8 Human/GPT rows in total. !Turn Token Count Histogram # Based on the Llama 3 Tokenizer !Turn --- Have a good one. `` Source Image: https://www.pixiv.net/en/artworks/91689070",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model: Sao10K/Llama-3.1-8B-Stheno-v3.4\nquantized_by: Lewdiculous\nlibrary_name: transformers\nlicense: cc-by-nc-4.0\ninference: false\nlanguage:\n- en\ntags:\n- roleplay\n- llama3\n- sillytavern\n---\nQuants for [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4).\n\n> [!NOTE]\n> **Q4_0 ARM/Mobile** quants here: [Llama-3.1-8B-Stheno-v3.4-GGUF-ARM-Imatrix-Supplementary](https://huggingface.co/Aetherarchio/Llama-3.1-8B-Stheno-v3.4-GGUF-ARM-Imatrix-Supplementary/).\n\nI recommend checking their page for feedback and support.\n\n> [!IMPORTANT]\n> **Quantization process:** <br>\n> Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. <br>\n> This hopefully avoids losses during conversion. <br>\n> To run this model, please use the [**latest version of KoboldCpp**](https://github.com/LostRuins/koboldcpp/releases/latest). <br>\n> If you noticed any issues let me know in the discussions.\n\n> [!NOTE]\n> **Presets:** <br>\n> Some compatible SillyTavern presets can be found [**here (Virt's Roleplay Presets - v1.9)**](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>\n> Check [**discussions such as this one**](https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa) and [**this one**](https://www.reddit.com/r/SillyTavernAI/comments/1dff2tl/my_personal_llama3_stheno_presets/) for other presets and samplers recommendations. <br>\n> Lower temperatures are recommended by the authors, so make sure to experiment. <br>\n>\n> **General usage with KoboldCpp:** <br>\n> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes without the use of `--quantkv`. <br>\n> Using `--quantkv 1` (≈Q8) or even `--quantkv 2` (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context. <br>\n> [**Read more about it in the release here**](https://github.com/LostRuins/koboldcpp/releases/tag/v1.67).\n\n\n![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/GV63jjNPXvSG-BSOGuP5h.png)\n\n<details>\n<summary>Click here for the original model card information.</summary>\n\n![img](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/meneno.jpg)\n\n---\n\nThanks to Backyard.ai for the compute to train this. :)\n\n---\n\nLlama-3.1-8B-Stheno-v3.4\n\nThis model has went through a multi-stage finetuning process.\n```\n- 1st, over a multi-turn Conversational-Instruct\n- 2nd, over a Creative Writing / Roleplay along with some Creative-based Instruct Datasets.\n- - Dataset consists of a mixture of Human and Claude Data.\n```\n\nPrompting Format:\n```\n- Use the L3 Instruct Formatting - Euryale 2.1 Preset Works Well\n- Temperature + min_p as per usual, I recommend 1.4 Temp + 0.2 min_p.\n- Has a different vibe to previous versions. Tinker around.\n```\n\nChanges since previous Stheno Datasets:\n```\n- Included Multi-turn Conversation-based Instruct Datasets to boost multi-turn coherency. # This is a seperate set, not the ones made by Kalomaze and Nopm, that are used in Magnum. They're completely different data.\n- Replaced Single-Turn Instruct with Better Prompts and Answers by Claude 3.5 Sonnet and Claude 3 Opus.\n- Removed c2 Samples -> Underway of re-filtering and masking to use with custom prefills. TBD\n- Included 55% more Roleplaying Examples based of [Gryphe's](https://huggingface.co/datasets/Gryphe/Sonnet3.5-Charcard-Roleplay) Charcard RP Sets. Further filtered and cleaned on.\n- Included 40% More Creative Writing Examples.\n- Included Datasets Targeting System Prompt Adherence.\n- Included Datasets targeting Reasoning / Spatial Awareness.\n- Filtered for the usual errors, slop and stuff at the end. Some may have slipped through, but I removed nearly all of it.\n```\n\nPersonal Opinions:\n```\n- Llama3.1 was more disappointing, in the Instruct Tune? It felt overbaked, atleast. Likely due to the DPO being done after their SFT Stage.\n- Tuning on L3.1 base did not give good results, unlike when I tested with Nemo base. unfortunate.\n- Still though, I think I did an okay job. It does feel a bit more distinctive.\n- It took a lot of tinkering, like a LOT to wrangle this.\n```\n\nBelow are some graphs and all for you to observe.\n\n---\n\n`Turn Distribution # 1 Turn is considered as 1 combined Human/GPT pair in a ShareGPT format. 4 Turns means 1 System Row + 8 Human/GPT rows in total.`\n\n![Turn](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/turns_distribution_bar_graph.png)\n\n`Token Count Histogram # Based on the Llama 3 Tokenizer`\n\n![Turn](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/token_count_histogram.png)\n\n---\n\nHave a good one.\n\n```\nSource Image: https://www.pixiv.net/en/artworks/91689070\n\n</details>",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "roleplay",
    "llama3",
    "sillytavern",
    "en",
    "base_model:Sao10K/Llama-3.1-8B-Stheno-v3.4",
    "base_model:quantized:Sao10K/Llama-3.1-8B-Stheno-v3.4",
    "license:cc-by-nc-4.0",
    "region:us",
    "conversational"
  ],
  "likes": 23,
  "downloads": 3045,
  "gated": false,
  "private": false,
  "last_modified": "2026-01-25T18:33:00.000Z",
  "created_at": "2024-08-21T02:48:00.000Z",
  "pipeline_tag": "",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "66c55560a22fa65e17079ceb",
  "id": "Lewdiculous/Llama-3.1-8B-Stheno-v3.4-GGUF-IQ-Imatrix",
  "modelId": "Lewdiculous/Llama-3.1-8B-Stheno-v3.4-GGUF-IQ-Imatrix",
  "sha": "ae530799fce8bcbaf68e25f4cbf572192ca428ca",
  "createdAt": "2024-08-21T02:48:00.000Z",
  "lastModified": "2026-01-25T18:33:00.000Z",
  "author": "Lewdiculous",
  "downloads": 3045,
  "likes": 23,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "transformers",
  "siblings_count": 15
}

lewdiculous/llama-3.1-8b-stheno-v3.4-gguf-iq-imatrix overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard