Model Intelligence Sheet
richarderkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf overview
Comprehensive model page for richarderkhov/kz919-slidingllama38bnofinetune-gguf
Downloads
107
Likes
0
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
22 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| sliding_llama3_8b_no_finetune.IQ3_M.gguf | GGUF | IQ3_M | 3.52 GB | Download |
| sliding_llama3_8b_no_finetune.IQ3_S.gguf | GGUF | IQ3_S | 3.43 GB | Download |
| sliding_llama3_8b_no_finetune.IQ3_XS.gguf | GGUF | IQ3_XS | 3.28 GB | Download |
| sliding_llama3_8b_no_finetune.IQ4_NL.gguf | GGUF | IQ4_NL | 4.38 GB | Download |
| sliding_llama3_8b_no_finetune.IQ4_XS.gguf | GGUF | IQ4_XS | 4.18 GB | Download |
| sliding_llama3_8b_no_finetune.Q2_K.gguf | GGUF | Q2_K | 2.96 GB | Download |
| sliding_llama3_8b_no_finetune.Q3_K.gguf | GGUF | Q3_K | 3.74 GB | Download |
| sliding_llama3_8b_no_finetune.Q3_K_L.gguf | GGUF | Q3_K_L | 4.03 GB | Download |
| sliding_llama3_8b_no_finetune.Q3_K_M.gguf | GGUF | Q3_K_M | 3.74 GB | Download |
| sliding_llama3_8b_no_finetune.Q3_K_S.gguf | GGUF | Q3_K_S | 3.41 GB | Download |
| sliding_llama3_8b_no_finetune.Q4_0.gguf | GGUF | — | 4.34 GB | Download |
| sliding_llama3_8b_no_finetune.Q4_1.gguf | GGUF | — | 4.78 GB | Download |
| sliding_llama3_8b_no_finetune.Q4_K.gguf | GGUF | Q4_K | 4.58 GB | Download |
| sliding_llama3_8b_no_finetune.Q4_K_M.gguf | GGUF | Q4_K_M | 4.58 GB | Download |
| sliding_llama3_8b_no_finetune.Q4_K_S.gguf | GGUF | Q4_K_S | 4.37 GB | Download |
| sliding_llama3_8b_no_finetune.Q5_0.gguf | GGUF | — | 5.21 GB | Download |
| sliding_llama3_8b_no_finetune.Q5_1.gguf | GGUF | — | 5.65 GB | Download |
| sliding_llama3_8b_no_finetune.Q5_K.gguf | GGUF | Q5_K | 5.34 GB | Download |
| sliding_llama3_8b_no_finetune.Q5_K_M.gguf | GGUF | Q5_K_M | 5.34 GB | Download |
| sliding_llama3_8b_no_finetune.Q5_K_S.gguf | GGUF | Q5_K_S | 5.21 GB | Download |
| sliding_llama3_8b_no_finetune.Q6_K.gguf | GGUF | Q6_K | 6.14 GB | Download |
| sliding_llama3_8b_no_finetune.Q8_0.gguf | GGUF | — | 7.95 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"frontmatter": {},
"hero_image_url": "",
"summary": "",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nsliding_llama3_8b_no_finetune - GGUF\n- Model creator: https://huggingface.co/kz919/\n- Original model: https://huggingface.co/kz919/sliding_llama3_8b_no_finetune/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [sliding_llama3_8b_no_finetune.Q2_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q2_K.gguf) | Q2_K | 2.96GB |\n| [sliding_llama3_8b_no_finetune.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_XS.gguf) | IQ3_XS | 3.28GB |\n| [sliding_llama3_8b_no_finetune.IQ3_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_S.gguf) | IQ3_S | 3.43GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_S.gguf) | Q3_K_S | 3.41GB |\n| [sliding_llama3_8b_no_finetune.IQ3_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_M.gguf) | IQ3_M | 3.52GB |\n| [sliding_llama3_8b_no_finetune.Q3_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K.gguf) | Q3_K | 3.74GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_M.gguf) | Q3_K_M | 3.74GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_L.gguf) | Q3_K_L | 4.03GB |\n| [sliding_llama3_8b_no_finetune.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ4_XS.gguf) | IQ4_XS | 4.18GB |\n| [sliding_llama3_8b_no_finetune.Q4_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_0.gguf) | Q4_0 | 4.34GB |\n| [sliding_llama3_8b_no_finetune.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ4_NL.gguf) | IQ4_NL | 4.38GB |\n| [sliding_llama3_8b_no_finetune.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K_S.gguf) | Q4_K_S | 4.37GB |\n| [sliding_llama3_8b_no_finetune.Q4_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K.gguf) | Q4_K | 4.58GB |\n| [sliding_llama3_8b_no_finetune.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K_M.gguf) | Q4_K_M | 4.58GB |\n| [sliding_llama3_8b_no_finetune.Q4_1.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_1.gguf) | Q4_1 | 4.78GB |\n| [sliding_llama3_8b_no_finetune.Q5_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_0.gguf) | Q5_0 | 5.21GB |\n| [sliding_llama3_8b_no_finetune.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K_S.gguf) | Q5_K_S | 5.21GB |\n| [sliding_llama3_8b_no_finetune.Q5_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K.gguf) | Q5_K | 5.34GB |\n| [sliding_llama3_8b_no_finetune.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K_M.gguf) | Q5_K_M | 5.34GB |\n| [sliding_llama3_8b_no_finetune.Q5_1.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_1.gguf) | Q5_1 | 5.65GB |\n| [sliding_llama3_8b_no_finetune.Q6_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q6_K.gguf) | Q6_K | 6.14GB |\n| [sliding_llama3_8b_no_finetune.Q8_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q8_0.gguf) | Q8_0 | 7.95GB |\n\n\n\n\nOriginal model description:\n---\nlicense: apache-2.0\npipeline_tag: text-generation\n---\n# Sliding Llama Model Card\n\n## Model Description\n\n**Model Name:** Sliding Llama\n\n**Base Model:** Llama 3\n\n**Description:** Sliding Llama is a variant of the Llama 3 model that introduces the ability to configure different layers with a sliding window approach. This configuration allows users to customize the attention and memory mechanisms across different layers.\n\n## Features\n\n- **Sliding Window Configuration:** Users can specify the size of sliding windows for different layers using the `sliding_windows` argument.\n- **Flexibility:** This model is highly adaptable, providing fine-tuned control over how information flows through the network.\n- **Enhanced Performance:** By adjusting sliding window sizes, users can potentially improve model performance on tasks requiring specific contextual understandings.\n\n## Usage\n\n### Installation\n\nTo use Sliding Llama for inference, you need to have a customized Hugging Face Transformers library installed. If you don't have it installed yet, you can do so with the following command:\n\n```bash\npip install git+https://github.com/kyleliang919/transformers\n```\n\nThis is important because we need a custom hybrid cache implementation for cached inference, since some of the model layers have different length of context (window).\n\nFor training, you can use the default transformers as it's.\n### Loading and using the Model\nThe `sliding_windows` argument is a list where each element specifies the window size for the corresponding layer. \nYou can load the Sliding Llama model using the following code snippet:\nFor instance, in the example below there is one full attention in every four layers and have a total interpolated context of 32K (originally llama3 8b has 8K context length)\n```python\nfrom transformers import AutoConfig, AutoTokenizer\nfrom modeling_sliding_llama import LlamaForCausalLM\n# Load the tokenizer and model\nconfig = AutoConfig.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\", trust_remote_code=True)\nconfig.sliding_windows = [512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0]\nconfig.rope_scaling = {\n \"factor\": 4.0,\n \"high_freq_factor\": 4.0,\n \"low_freq_factor\": 1.0,\n \"original_max_position_embeddings\": 8192,\n \"rope_type\": \"llama3\"\n }\ntokenizer = AutoTokenizer.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\")\nmodel = LlamaForCausalLM.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\",\n config = config,\n device_map=\"auto\",\n trust_remote_code=True)\nprompt = \"Your prompt here\"\ninputs = tokenizer(prompt, return_tensors = \"pt\")\noutputs = model.generate(**inputs, use_cache = True)\nprint(tokenizer.decode(outputs[0]))\n```\n\nNotice in this repo, the weights are not finetuned (as indicated in the name), the weights are exactly identical as Llama3, you should be able to swap the weights or add a lora on top to accustom it to longer context.\nTo use Lora adapters, you can use the following command after you load the model as above\n```\nfrom peft import PeftModel\nmodel = PeftModel.from_pretrained(model, \"path_to_your_adepter\")\nmodel = model.merge_and_unload()\n```\nThen you can do inference, generation calls as usual.\n\n## Limitations and Future Work\n\n- **Computational Overhead:** Configuring large sliding windows for multiple layers might increase computational requirements.\n- **Optimal Configuration:** Finding the optimal sliding window sizes for specific tasks may require experimentation and tuning.\n\n## Acknowledgments\n\nWe thank the developers and researchers behind Llama 3 and the Hugging Face community for their contributions and support.\n\n## Citation\n\nIf you use this model in your research, please cite:\n\n```\n@inproceedings{slidingllama2024,\n title={Sliding Llama},\n author={Kaizhao Liang},\n year={2024}\n}\n```\n\n## License\n\nThe Sliding Llama model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).\n\n---\n\nFor more details and updates, visit the [Sliding Llama GitHub repository]().\n\n",
"related_quantizations": []
},
"tags": [
"gguf",
"endpoints_compatible",
"region:us"
],
"likes": 0,
"downloads": 107,
"gated": false,
"private": false,
"last_modified": "2024-08-10T08:59:23.000Z",
"created_at": "2024-08-10T07:12:02.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "66b712c20346e40231314086",
"id": "RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf",
"modelId": "RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf",
"sha": "1ab4b842e7c5a0c05d682293fb1b8b88d78a57a4",
"createdAt": "2024-08-10T07:12:02.000Z",
"lastModified": "2024-08-10T08:59:23.000Z",
"author": "RichardErkhov",
"downloads": 107,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 24
}