GraySoft
Projects Models About FAQ Contact Download guIDE →
Model Intelligence Sheet

richarderkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf overview

Comprehensive model page for richarderkhov/kz919-slidingllama38bnofinetune-gguf

ggufendpoints_compatibleregion:us
richarderkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf visual
Downloads
107
Likes
0
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

22 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
sliding_llama3_8b_no_finetune.IQ3_M.gguf GGUF IQ3_M 3.52 GB Download
sliding_llama3_8b_no_finetune.IQ3_S.gguf GGUF IQ3_S 3.43 GB Download
sliding_llama3_8b_no_finetune.IQ3_XS.gguf GGUF IQ3_XS 3.28 GB Download
sliding_llama3_8b_no_finetune.IQ4_NL.gguf GGUF IQ4_NL 4.38 GB Download
sliding_llama3_8b_no_finetune.IQ4_XS.gguf GGUF IQ4_XS 4.18 GB Download
sliding_llama3_8b_no_finetune.Q2_K.gguf GGUF Q2_K 2.96 GB Download
sliding_llama3_8b_no_finetune.Q3_K.gguf GGUF Q3_K 3.74 GB Download
sliding_llama3_8b_no_finetune.Q3_K_L.gguf GGUF Q3_K_L 4.03 GB Download
sliding_llama3_8b_no_finetune.Q3_K_M.gguf GGUF Q3_K_M 3.74 GB Download
sliding_llama3_8b_no_finetune.Q3_K_S.gguf GGUF Q3_K_S 3.41 GB Download
sliding_llama3_8b_no_finetune.Q4_0.gguf GGUF 4.34 GB Download
sliding_llama3_8b_no_finetune.Q4_1.gguf GGUF 4.78 GB Download
sliding_llama3_8b_no_finetune.Q4_K.gguf GGUF Q4_K 4.58 GB Download
sliding_llama3_8b_no_finetune.Q4_K_M.gguf GGUF Q4_K_M 4.58 GB Download
sliding_llama3_8b_no_finetune.Q4_K_S.gguf GGUF Q4_K_S 4.37 GB Download
sliding_llama3_8b_no_finetune.Q5_0.gguf GGUF 5.21 GB Download
sliding_llama3_8b_no_finetune.Q5_1.gguf GGUF 5.65 GB Download
sliding_llama3_8b_no_finetune.Q5_K.gguf GGUF Q5_K 5.34 GB Download
sliding_llama3_8b_no_finetune.Q5_K_M.gguf GGUF Q5_K_M 5.34 GB Download
sliding_llama3_8b_no_finetune.Q5_K_S.gguf GGUF Q5_K_S 5.21 GB Download
sliding_llama3_8b_no_finetune.Q6_K.gguf GGUF Q6_K 6.14 GB Download
sliding_llama3_8b_no_finetune.Q8_0.gguf GGUF 7.95 GB Download

Model Details Live

Model Slug
richarderkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf
Author
RichardErkhov
Pipeline Task
Library
Created
2024-08-10
Last Modified
2024-08-10
Gated
No
Private
No
HF SHA
1ab4b842e7c5a0c05d682293fb1b8b88d78a57a4
License
Unknown
Language
Unknown
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nsliding_llama3_8b_no_finetune - GGUF\n- Model creator: https://huggingface.co/kz919/\n- Original model: https://huggingface.co/kz919/sliding_llama3_8b_no_finetune/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [sliding_llama3_8b_no_finetune.Q2_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q2_K.gguf) | Q2_K | 2.96GB |\n| [sliding_llama3_8b_no_finetune.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_XS.gguf) | IQ3_XS | 3.28GB |\n| [sliding_llama3_8b_no_finetune.IQ3_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_S.gguf) | IQ3_S | 3.43GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_S.gguf) | Q3_K_S | 3.41GB |\n| [sliding_llama3_8b_no_finetune.IQ3_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ3_M.gguf) | IQ3_M | 3.52GB |\n| [sliding_llama3_8b_no_finetune.Q3_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K.gguf) | Q3_K | 3.74GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_M.gguf) | Q3_K_M | 3.74GB |\n| [sliding_llama3_8b_no_finetune.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q3_K_L.gguf) | Q3_K_L | 4.03GB |\n| [sliding_llama3_8b_no_finetune.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ4_XS.gguf) | IQ4_XS | 4.18GB |\n| [sliding_llama3_8b_no_finetune.Q4_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_0.gguf) | Q4_0 | 4.34GB |\n| [sliding_llama3_8b_no_finetune.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.IQ4_NL.gguf) | IQ4_NL | 4.38GB |\n| [sliding_llama3_8b_no_finetune.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K_S.gguf) | Q4_K_S | 4.37GB |\n| [sliding_llama3_8b_no_finetune.Q4_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K.gguf) | Q4_K | 4.58GB |\n| [sliding_llama3_8b_no_finetune.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_K_M.gguf) | Q4_K_M | 4.58GB |\n| [sliding_llama3_8b_no_finetune.Q4_1.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q4_1.gguf) | Q4_1 | 4.78GB |\n| [sliding_llama3_8b_no_finetune.Q5_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_0.gguf) | Q5_0 | 5.21GB |\n| [sliding_llama3_8b_no_finetune.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K_S.gguf) | Q5_K_S | 5.21GB |\n| [sliding_llama3_8b_no_finetune.Q5_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K.gguf) | Q5_K | 5.34GB |\n| [sliding_llama3_8b_no_finetune.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_K_M.gguf) | Q5_K_M | 5.34GB |\n| [sliding_llama3_8b_no_finetune.Q5_1.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q5_1.gguf) | Q5_1 | 5.65GB |\n| [sliding_llama3_8b_no_finetune.Q6_K.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q6_K.gguf) | Q6_K | 6.14GB |\n| [sliding_llama3_8b_no_finetune.Q8_0.gguf](https://huggingface.co/RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf/blob/main/sliding_llama3_8b_no_finetune.Q8_0.gguf) | Q8_0 | 7.95GB |\n\n\n\n\nOriginal model description:\n---\nlicense: apache-2.0\npipeline_tag: text-generation\n---\n# Sliding Llama Model Card\n\n## Model Description\n\n**Model Name:** Sliding Llama\n\n**Base Model:** Llama 3\n\n**Description:** Sliding Llama is a variant of the Llama 3 model that introduces the ability to configure different layers with a sliding window approach. This configuration allows users to customize the attention and memory mechanisms across different layers.\n\n## Features\n\n- **Sliding Window Configuration:** Users can specify the size of sliding windows for different layers using the `sliding_windows` argument.\n- **Flexibility:** This model is highly adaptable, providing fine-tuned control over how information flows through the network.\n- **Enhanced Performance:** By adjusting sliding window sizes, users can potentially improve model performance on tasks requiring specific contextual understandings.\n\n## Usage\n\n### Installation\n\nTo use Sliding Llama for inference, you need to have a customized Hugging Face Transformers library installed. If you don't have it installed yet, you can do so with the following command:\n\n```bash\npip install git+https://github.com/kyleliang919/transformers\n```\n\nThis is important because we need a custom hybrid cache implementation for cached inference, since some of the model layers have different length of context (window).\n\nFor training, you can use the default transformers as it's.\n### Loading and using the Model\nThe `sliding_windows` argument is a list where each element specifies the window size for the corresponding layer. \nYou can load the Sliding Llama model using the following code snippet:\nFor instance, in the example below there is one full attention in every four layers and have a total interpolated context of 32K (originally llama3 8b has 8K context length)\n```python\nfrom transformers import AutoConfig, AutoTokenizer\nfrom modeling_sliding_llama import LlamaForCausalLM\n# Load the tokenizer and model\nconfig = AutoConfig.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\", trust_remote_code=True)\nconfig.sliding_windows = [512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0, 512, 512, 512, 0]\nconfig.rope_scaling = {\n    \"factor\": 4.0,\n    \"high_freq_factor\": 4.0,\n    \"low_freq_factor\": 1.0,\n    \"original_max_position_embeddings\": 8192,\n    \"rope_type\": \"llama3\"\n  }\ntokenizer = AutoTokenizer.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\")\nmodel = LlamaForCausalLM.from_pretrained(\"kz919/sliding_llama3_8b_no_finetune\",\n                                                config = config,\n                                                device_map=\"auto\",\n                                                trust_remote_code=True)\nprompt = \"Your prompt here\"\ninputs = tokenizer(prompt, return_tensors = \"pt\")\noutputs = model.generate(**inputs, use_cache = True)\nprint(tokenizer.decode(outputs[0]))\n```\n\nNotice in this repo, the weights are not finetuned (as indicated in the name), the weights are exactly identical as Llama3, you should be able to swap the weights or add a lora on top to accustom it to longer context.\nTo use Lora adapters, you can use the following command after you load the model as above\n```\nfrom peft import PeftModel\nmodel = PeftModel.from_pretrained(model, \"path_to_your_adepter\")\nmodel = model.merge_and_unload()\n```\nThen you can do inference, generation calls as usual.\n\n## Limitations and Future Work\n\n- **Computational Overhead:** Configuring large sliding windows for multiple layers might increase computational requirements.\n- **Optimal Configuration:** Finding the optimal sliding window sizes for specific tasks may require experimentation and tuning.\n\n## Acknowledgments\n\nWe thank the developers and researchers behind Llama 3 and the Hugging Face community for their contributions and support.\n\n## Citation\n\nIf you use this model in your research, please cite:\n\n```\n@inproceedings{slidingllama2024,\n  title={Sliding Llama},\n  author={Kaizhao Liang},\n  year={2024}\n}\n```\n\n## License\n\nThe Sliding Llama model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).\n\n---\n\nFor more details and updates, visit the [Sliding Llama GitHub repository]().\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 107,
  "gated": false,
  "private": false,
  "last_modified": "2024-08-10T08:59:23.000Z",
  "created_at": "2024-08-10T07:12:02.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "66b712c20346e40231314086",
  "id": "RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf",
  "modelId": "RichardErkhov/kz919_-_sliding_llama3_8b_no_finetune-gguf",
  "sha": "1ab4b842e7c5a0c05d682293fb1b8b88d78a57a4",
  "createdAt": "2024-08-10T07:12:02.000Z",
  "lastModified": "2024-08-10T08:59:23.000Z",
  "author": "RichardErkhov",
  "downloads": 107,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 24
}