lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix BF16 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix overview

My GGUF-IQ-Imatrix quants for Sao10K/L3-8B-Stheno-v3.3-32K. Sao10K with Stheno yet again, now bigger and better than ever! I recommend checking his page for feedback and support. Quantization process: Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. To run this model, please use the latest version of KoboldCpp. If you noticed any issues let me know in the discussions. General usage: For 8GB VRAM GPUs, I recommend the Q4KM-imat (4.89 BPW) quant for up to 12288 context sizes. Presets: Some compatible SillyTavern presets can be found here (Virt's Roleplay Presets). Check discussions such as this one for other recommendations and samplers. ⇲ Click here to expand/hide information – General chart with relative quant parformances. Recommended read: "Which GGUF is right for me? (Opinionated)" by Artefact2 Click the image to view full size. !"Which GGUF is right for me? (Opinionated)" by Artefact2 - Firs Graph Personal-support: I apologize for disrupting your experience. Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. If you want and you are able to... You can spare some change over here (Ko-fi). Author-support: You can support the author at their own page. !image/png Original model card information.

transformersggufroleplayllama3sillytavernenbase_model:Sao10K/L3-8B-Stheno-v3.3-32Kbase_model:quantized:Sao10K/L3-8B-Stheno-v3.3-32Klicense:cc-by-nc-4.0region:usconversational

lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix visual

Downloads

984

Likes

Pipeline

—

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

12 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
L3-8B-Stheno-v3.3-32K-BF16.gguf	GGUF	BF16	14.97 GB	Download
L3-8B-Stheno-v3.3-32K-F16.gguf	GGUF	F16	14.97 GB	Download
L3-8B-Stheno-v3.3-32K-IQ3_M-imat.gguf	GGUF	IQ3_M	3.52 GB	Download
L3-8B-Stheno-v3.3-32K-IQ3_S-imat.gguf	GGUF	IQ3_S	3.43 GB	Download
L3-8B-Stheno-v3.3-32K-IQ3_XXS-imat.gguf	GGUF	IQ3_XXS	3.05 GB	Download
L3-8B-Stheno-v3.3-32K-IQ4_XS-imat.gguf	GGUF	IQ4_XS	4.14 GB	Download
L3-8B-Stheno-v3.3-32K-Q4_K_M-imat.gguf	GGUF	Q4_K_M	4.58 GB	Download
L3-8B-Stheno-v3.3-32K-Q4_K_S-imat.gguf	GGUF	Q4_K_S	4.37 GB	Download
L3-8B-Stheno-v3.3-32K-Q5_K_M-imat.gguf	GGUF	Q5_K_M	5.34 GB	Download
L3-8B-Stheno-v3.3-32K-Q5_K_S-imat.gguf	GGUF	Q5_K_S	5.21 GB	Download
L3-8B-Stheno-v3.3-32K-Q6_K-imat.gguf	GGUF	Q6_K	6.14 GB	Download
L3-8B-Stheno-v3.3-32K-Q8_0-imat.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix

Author

Lewdiculous

Pipeline Task

—

Library

transformers

Created

2024-06-23

Last Modified

2024-09-03

Gated

Private

HF SHA

7771e693e0f68fc0c1e7e0df44a8d4bade6b2d99

License

cc-by-nc-4.0

Language

Base Model

Sao10K/L3-8B-Stheno-v3.3-32K

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "base_model": "Sao10K/L3-8B-Stheno-v3.3-32K",
    "quantized_by": "Lewdiculous",
    "library_name": "transformers",
    "license": "cc-by-nc-4.0",
    "inference": false,
    "language": [
      "en"
    ],
    "tags": [
      "roleplay",
      "llama3",
      "sillytavern"
    ],
    "frontmatter": {
      "base_model": "Sao10K/L3-8B-Stheno-v3.3-32K",
      "quantized_by": "Lewdiculous",
      "library_name": "transformers",
      "license": "cc-by-nc-4.0",
      "inference": "false",
      "language": [
        "en"
      ],
      "tags": [
        "roleplay",
        "llama3",
        "sillytavern"
      ]
    },
    "hero_image_url": "https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/fScWdHIPix5IzNJ8yswCB.webp",
    "summary": "My GGUF-IQ-Imatrix quants for **Sao10K/L3-8B-Stheno-v3.3-32K**. **Sao10K** with Stheno **yet** again, now bigger and better than ever!  I recommend checking his page for feedback and support. > [!IMPORTANT] > **Quantization process:**  > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF.  > This is a bit more disk and compute intensive but hopefully avoids any losses during conversion.  > To run this model, please use the **latest version of KoboldCpp**.  > If you noticed any issues let me know in the discussions. > [!NOTE] > **General usage:**  > For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes.  > > **Presets:**  > Some compatible SillyTavern presets can be found **here (Virt's Roleplay Presets)**.  > Check **discussions such as this one** for other recommendations and samplers.  ⇲ Click here to expand/hide information – General chart with relative quant parformances. > [!NOTE] > **Recommended read:**  > > **\"Which GGUF is right for me? (Opinionated)\" by Artefact2** > > *Click the image to view full size.* > !\"Which GGUF is right for me? (Opinionated)\" by Artefact2 - Firs Graph  > [!TIP] > **Personal-support:**  > I apologize for disrupting your experience.  > Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful.  > If you **want** and you are **able to**...  > You can **spare some change over here (Ko-fi)**.  > > **Author-support:**  > You can support the author **at their own page**. !image/png  Original model card information.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nbase_model: Sao10K/L3-8B-Stheno-v3.3-32K\nquantized_by: Lewdiculous\nlibrary_name: transformers\nlicense: cc-by-nc-4.0\ninference: false\nlanguage:\n- en\ntags:\n- roleplay\n- llama3\n- sillytavern\n---\n\n# #roleplay #sillytavern #llama3\n\nMy GGUF-IQ-Imatrix quants for [**Sao10K/L3-8B-Stheno-v3.3-32K**](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K).\n\n**Sao10K** with Stheno **yet** again, now bigger and better than ever! <br>\nI recommend checking his page for feedback and support.\n\n> [!IMPORTANT]\n> **Quantization process:** <br>\n> Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. <br>\n> This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. <br>\n> To run this model, please use the [**latest version of KoboldCpp**](https://github.com/LostRuins/koboldcpp/releases/latest). <br>\n> If you noticed any issues let me know in the discussions.\n\n> [!NOTE]\n> **General usage:** <br>\n> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes. <br>\n>\n> **Presets:** <br>\n> Some compatible SillyTavern presets can be found [**here (Virt's Roleplay Presets)**](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>\n> Check [**discussions such as this one**](https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa) for other recommendations and samplers.\n\n<details>\n<summary>⇲ Click here to expand/hide information – General chart with relative quant parformances.</summary> \n\n> [!NOTE]\n> **Recommended read:** <br>\n> \n> [**\"Which GGUF is right for me? (Opinionated)\" by Artefact2**](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)\n> \n> *Click the image to view full size.*\n> ![\"Which GGUF is right for me? (Opinionated)\" by Artefact2 - Firs Graph](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/fScWdHIPix5IzNJ8yswCB.webp)\n\n</details>\n\n> [!TIP]\n> **Personal-support:** <br>\n> I apologize for disrupting your experience. <br>\n> Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. <br>\n> If you **want** and you are **able to**... <br>\n> You can [**spare some change over here (Ko-fi)**](https://ko-fi.com/Lewdiculous). <br>\n>\n> **Author-support:** <br>\n> You can support the author [**at their own page**](https://ko-fi.com/sao10k).\n\n![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/1wb5-yFyvxWQSWBMlB36x.png)\n\n<details>\n<summary>Original model card information.</summary>\n\n## **Original card:**\n\n\nTrained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them and @dynafire for helping me out.\n\n---\n\nTraining Details:\n<br>Trained at 8K Context -> Expanded to 32K Context with PoSE training.\n\nDataset Modifications:\n<br>\\- Further Cleaned up Roleplaying Samples -> Quality Check\n<br>\\- Removed Low Quality Samples from Manual Check -> Increased Baseline Quality Floor\n<br>\\- More Creative Writing Samples -> 2x Samples\n<br>\\- Remade and Refined Detailed Instruct Data\n\nNotes:\n<br>\\- Training run is much less aggressive than previous Stheno versions.\n<br>\\- This model works when tested in bf16 with the same configs as within the file.\n<br>\\- I do not know the effects quantisation has on it.\n<br>\\- Roleplays pretty well. Feels nice in my opinion.\n<br>\\- It has some issues on long context understanding and reasoning. Much better vs rope scaling normally though, so that is a plus.\n<br>\\- Reminder, this isn't a native 32K model. It has it's issues, but it's coherent and working well.\n\nSanity Check // Needle in a Haystack Results:\n<br>\\- This is not as complex as RULER or NIAN, but it's a basic evaluator. Some improper train examples had Haystack scores ranging from Red to Orange for most of the extended contexts.\n![Results](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/resolve/main/haystack.png)\n\nWandb Run:\n![Wandb](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/resolve/main/wandb.png)\n\n---\n\nRelevant Axolotl Configurations:\n<br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)\n<br>\\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.\n<br>\\- 2M Rope Theta had the best loss results during training compared to other values.\n<br>\\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.\n<br>\\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting.\n<br>\\- Pretraining / Noise made it worse at Haystack too? It wasn't all Green, Mainly Oranges.\n<br>\\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.\n\n```\nsequence_len: 8192\nuse_pose: true\npose_max_context_len: 32768\n\noverrides_of_model_config:\n  rope_theta: 2000000.0\n  max_position_embeddings: 32768\n\n  # peft_use_dora: true\nadapter: lora\npeft_use_rslora: true\nlora_model_dir:\nlora_r: 256\nlora_alpha: 256\nlora_dropout: 0.1\nlora_target_linear: true\nlora_target_modules:\n  - gate_proj\n  - down_proj\n  - up_proj\n  - q_proj\n  - v_proj\n  - k_proj\n  - o_proj\n\nwarmup_steps: 80\ngradient_accumulation_steps: 6\nmicro_batch_size: 1\nnum_epochs: 2\noptimizer: adamw_bnb_8bit\nlr_scheduler: cosine_with_min_lr\nlearning_rate: 0.00004\nlr_scheduler_kwargs:\n    min_lr: 0.000004\n```\n\n</details>",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "roleplay",
    "llama3",
    "sillytavern",
    "en",
    "base_model:Sao10K/L3-8B-Stheno-v3.3-32K",
    "base_model:quantized:Sao10K/L3-8B-Stheno-v3.3-32K",
    "license:cc-by-nc-4.0",
    "region:us",
    "conversational"
  ],
  "likes": 39,
  "downloads": 984,
  "gated": false,
  "private": false,
  "last_modified": "2024-09-03T04:53:49.000Z",
  "created_at": "2024-06-23T20:13:10.000Z",
  "pipeline_tag": "",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "667881d6a845e4470f89918c",
  "id": "Lewdiculous/L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix",
  "modelId": "Lewdiculous/L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix",
  "sha": "7771e693e0f68fc0c1e7e0df44a8d4bade6b2d99",
  "createdAt": "2024-06-23T20:13:10.000Z",
  "lastModified": "2024-09-03T04:53:49.000Z",
  "author": "Lewdiculous",
  "downloads": 984,
  "likes": 39,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "transformers",
  "siblings_count": 16
}

lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard