lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix BF16 GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
lewdiculous/l3-8b-stheno-v3.3-32k-gguf-iq-imatrix overview
My GGUF-IQ-Imatrix quants for Sao10K/L3-8B-Stheno-v3.3-32K. Sao10K with Stheno yet again, now bigger and better than ever! I recommend checking his page for feedback and support. Quantization process: Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. To run this model, please use the latest version of KoboldCpp. If you noticed any issues let me know in the discussions. General usage: For 8GB VRAM GPUs, I recommend the Q4KM-imat (4.89 BPW) quant for up to 12288 context sizes. Presets: Some compatible SillyTavern presets can be found here (Virt's Roleplay Presets). Check discussions such as this one for other recommendations and samplers. ⇲ Click here to expand/hide information – General chart with relative quant parformances. Recommended read: "Which GGUF is right for me? (Opinionated)" by Artefact2 Click the image to view full size. !"Which GGUF is right for me? (Opinionated)" by Artefact2 - Firs Graph Personal-support: I apologize for disrupting your experience. Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. If you want and you are able to... You can spare some change over here (Ko-fi). Author-support: You can support the author at their own page. !image/png Original model card information.
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| L3-8B-Stheno-v3.3-32K-BF16.gguf | GGUF | BF16 | 14.97 GB | Download |
| L3-8B-Stheno-v3.3-32K-F16.gguf | GGUF | F16 | 14.97 GB | Download |
| L3-8B-Stheno-v3.3-32K-IQ3_M-imat.gguf | GGUF | IQ3_M | 3.52 GB | Download |
| L3-8B-Stheno-v3.3-32K-IQ3_S-imat.gguf | GGUF | IQ3_S | 3.43 GB | Download |
| L3-8B-Stheno-v3.3-32K-IQ3_XXS-imat.gguf | GGUF | IQ3_XXS | 3.05 GB | Download |
| L3-8B-Stheno-v3.3-32K-IQ4_XS-imat.gguf | GGUF | IQ4_XS | 4.14 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q4_K_M-imat.gguf | GGUF | Q4_K_M | 4.58 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q4_K_S-imat.gguf | GGUF | Q4_K_S | 4.37 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q5_K_M-imat.gguf | GGUF | Q5_K_M | 5.34 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q5_K_S-imat.gguf | GGUF | Q5_K_S | 5.21 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q6_K-imat.gguf | GGUF | Q6_K | 6.14 GB | Download |
| L3-8B-Stheno-v3.3-32K-Q8_0-imat.gguf | GGUF | — | 7.95 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"base_model": "Sao10K/L3-8B-Stheno-v3.3-32K",
"quantized_by": "Lewdiculous",
"library_name": "transformers",
"license": "cc-by-nc-4.0",
"inference": false,
"language": [
"en"
],
"tags": [
"roleplay",
"llama3",
"sillytavern"
],
"frontmatter": {
"base_model": "Sao10K/L3-8B-Stheno-v3.3-32K",
"quantized_by": "Lewdiculous",
"library_name": "transformers",
"license": "cc-by-nc-4.0",
"inference": "false",
"language": [
"en"
],
"tags": [
"roleplay",
"llama3",
"sillytavern"
]
},
"hero_image_url": "https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/fScWdHIPix5IzNJ8yswCB.webp",
"summary": "My GGUF-IQ-Imatrix quants for **Sao10K/L3-8B-Stheno-v3.3-32K**. **Sao10K** with Stheno **yet** again, now bigger and better than ever! I recommend checking his page for feedback and support. > [!IMPORTANT] > **Quantization process:** > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. > This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. > To run this model, please use the **latest version of KoboldCpp**. > If you noticed any issues let me know in the discussions. > [!NOTE] > **General usage:** > For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes. > > **Presets:** > Some compatible SillyTavern presets can be found **here (Virt's Roleplay Presets)**. > Check **discussions such as this one** for other recommendations and samplers. ⇲ Click here to expand/hide information – General chart with relative quant parformances. > [!NOTE] > **Recommended read:** > > **\"Which GGUF is right for me? (Opinionated)\" by Artefact2** > > *Click the image to view full size.* > !\"Which GGUF is right for me? (Opinionated)\" by Artefact2 - Firs Graph > [!TIP] > **Personal-support:** > I apologize for disrupting your experience. > Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. > If you **want** and you are **able to**... > You can **spare some change over here (Ko-fi)**. > > **Author-support:** > You can support the author **at their own page**. !image/png Original model card information.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nbase_model: Sao10K/L3-8B-Stheno-v3.3-32K\nquantized_by: Lewdiculous\nlibrary_name: transformers\nlicense: cc-by-nc-4.0\ninference: false\nlanguage:\n- en\ntags:\n- roleplay\n- llama3\n- sillytavern\n---\n\n# #roleplay #sillytavern #llama3\n\nMy GGUF-IQ-Imatrix quants for [**Sao10K/L3-8B-Stheno-v3.3-32K**](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K).\n\n**Sao10K** with Stheno **yet** again, now bigger and better than ever! <br>\nI recommend checking his page for feedback and support.\n\n> [!IMPORTANT]\n> **Quantization process:** <br>\n> Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. <br>\n> This is a bit more disk and compute intensive but hopefully avoids any losses during conversion. <br>\n> To run this model, please use the [**latest version of KoboldCpp**](https://github.com/LostRuins/koboldcpp/releases/latest). <br>\n> If you noticed any issues let me know in the discussions.\n\n> [!NOTE]\n> **General usage:** <br>\n> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes. <br>\n>\n> **Presets:** <br>\n> Some compatible SillyTavern presets can be found [**here (Virt's Roleplay Presets)**](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>\n> Check [**discussions such as this one**](https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa) for other recommendations and samplers.\n\n<details>\n<summary>⇲ Click here to expand/hide information – General chart with relative quant parformances.</summary> \n\n> [!NOTE]\n> **Recommended read:** <br>\n> \n> [**\"Which GGUF is right for me? (Opinionated)\" by Artefact2**](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)\n> \n> *Click the image to view full size.*\n> \n\n</details>\n\n> [!TIP]\n> **Personal-support:** <br>\n> I apologize for disrupting your experience. <br>\n> Eventually I may be able to use a dedicated server for this, but for now hopefully these quants are helpful. <br>\n> If you **want** and you are **able to**... <br>\n> You can [**spare some change over here (Ko-fi)**](https://ko-fi.com/Lewdiculous). <br>\n>\n> **Author-support:** <br>\n> You can support the author [**at their own page**](https://ko-fi.com/sao10k).\n\n\n\n<details>\n<summary>Original model card information.</summary>\n\n## **Original card:**\n\n\nTrained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them and @dynafire for helping me out.\n\n---\n\nTraining Details:\n<br>Trained at 8K Context -> Expanded to 32K Context with PoSE training.\n\nDataset Modifications:\n<br>\\- Further Cleaned up Roleplaying Samples -> Quality Check\n<br>\\- Removed Low Quality Samples from Manual Check -> Increased Baseline Quality Floor\n<br>\\- More Creative Writing Samples -> 2x Samples\n<br>\\- Remade and Refined Detailed Instruct Data\n\nNotes:\n<br>\\- Training run is much less aggressive than previous Stheno versions.\n<br>\\- This model works when tested in bf16 with the same configs as within the file.\n<br>\\- I do not know the effects quantisation has on it.\n<br>\\- Roleplays pretty well. Feels nice in my opinion.\n<br>\\- It has some issues on long context understanding and reasoning. Much better vs rope scaling normally though, so that is a plus.\n<br>\\- Reminder, this isn't a native 32K model. It has it's issues, but it's coherent and working well.\n\nSanity Check // Needle in a Haystack Results:\n<br>\\- This is not as complex as RULER or NIAN, but it's a basic evaluator. Some improper train examples had Haystack scores ranging from Red to Orange for most of the extended contexts.\n\n\nWandb Run:\n\n\n---\n\nRelevant Axolotl Configurations:\n<br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)\n<br>\\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.\n<br>\\- 2M Rope Theta had the best loss results during training compared to other values.\n<br>\\- Leaving it at 500K rope wasn't that much worse, but 4M and 8M Theta made the grad_norm values worsen even if loss drops fast.\n<br>\\- Mixing in Pretraining Data was a PITA. Made it a lot worse with formatting.\n<br>\\- Pretraining / Noise made it worse at Haystack too? It wasn't all Green, Mainly Oranges.\n<br>\\- Improper / Bad Rope Theta shows in Grad_Norm exploding to thousands. It'll drop to low values alright, but it's a scary fast drop even with gradient clipping.\n\n```\nsequence_len: 8192\nuse_pose: true\npose_max_context_len: 32768\n\noverrides_of_model_config:\n rope_theta: 2000000.0\n max_position_embeddings: 32768\n\n # peft_use_dora: true\nadapter: lora\npeft_use_rslora: true\nlora_model_dir:\nlora_r: 256\nlora_alpha: 256\nlora_dropout: 0.1\nlora_target_linear: true\nlora_target_modules:\n - gate_proj\n - down_proj\n - up_proj\n - q_proj\n - v_proj\n - k_proj\n - o_proj\n\nwarmup_steps: 80\ngradient_accumulation_steps: 6\nmicro_batch_size: 1\nnum_epochs: 2\noptimizer: adamw_bnb_8bit\nlr_scheduler: cosine_with_min_lr\nlearning_rate: 0.00004\nlr_scheduler_kwargs:\n min_lr: 0.000004\n```\n\n</details>",
"related_quantizations": []
},
"tags": [
"transformers",
"gguf",
"roleplay",
"llama3",
"sillytavern",
"en",
"base_model:Sao10K/L3-8B-Stheno-v3.3-32K",
"base_model:quantized:Sao10K/L3-8B-Stheno-v3.3-32K",
"license:cc-by-nc-4.0",
"region:us",
"conversational"
],
"likes": 39,
"downloads": 984,
"gated": false,
"private": false,
"last_modified": "2024-09-03T04:53:49.000Z",
"created_at": "2024-06-23T20:13:10.000Z",
"pipeline_tag": "",
"library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "667881d6a845e4470f89918c",
"id": "Lewdiculous/L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix",
"modelId": "Lewdiculous/L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix",
"sha": "7771e693e0f68fc0c1e7e0df44a8d4bade6b2d99",
"createdAt": "2024-06-23T20:13:10.000Z",
"lastModified": "2024-09-03T04:53:49.000Z",
"author": "Lewdiculous",
"downloads": 984,
"likes": 39,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "transformers",
"siblings_count": 16
}