duyntnet/longwriter-llama3.1-8b-imatrix-gguf IQ3_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
duyntnet/longwriter-llama3.1-8b-imatrix-gguf overview
LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\n{system prompt}\n>\n\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code:
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| LongWriter-llama3.1-8b-IQ1_M.gguf | GGUF | IQ1_M | 2.01 GB | Download |
| LongWriter-llama3.1-8b-IQ1_S.gguf | GGUF | IQ1_S | 1.88 GB | Download |
| LongWriter-llama3.1-8b-IQ2_M.gguf | GGUF | IQ2_M | 2.75 GB | Download |
| LongWriter-llama3.1-8b-IQ2_S.gguf | GGUF | IQ2_S | 2.57 GB | Download |
| LongWriter-llama3.1-8b-IQ2_XS.gguf | GGUF | IQ2_XS | 2.43 GB | Download |
| LongWriter-llama3.1-8b-IQ2_XXS.gguf | GGUF | IQ2_XXS | 2.23 GB | Download |
| LongWriter-llama3.1-8b-IQ3_M.gguf | GGUF | IQ3_M | 3.52 GB | Download |
| LongWriter-llama3.1-8b-IQ3_S.gguf | GGUF | IQ3_S | 3.43 GB | Download |
| LongWriter-llama3.1-8b-IQ3_XS.gguf | GGUF | IQ3_XS | 3.28 GB | Download |
| LongWriter-llama3.1-8b-IQ3_XXS.gguf | GGUF | IQ3_XXS | 3.05 GB | Download |
| LongWriter-llama3.1-8b-IQ4_NL.gguf | GGUF | IQ4_NL | 4.36 GB | Download |
| LongWriter-llama3.1-8b-IQ4_XS.gguf | GGUF | IQ4_XS | 4.14 GB | Download |
| LongWriter-llama3.1-8b-Q2_K.gguf | GGUF | Q2_K | 2.96 GB | Download |
| LongWriter-llama3.1-8b-Q2_K_S.gguf | GGUF | Q2_K_S | 2.78 GB | Download |
| LongWriter-llama3.1-8b-Q3_K_L.gguf | GGUF | Q3_K_L | 4.03 GB | Download |
| LongWriter-llama3.1-8b-Q3_K_M.gguf | GGUF | Q3_K_M | 3.74 GB | Download |
| LongWriter-llama3.1-8b-Q3_K_S.gguf | GGUF | Q3_K_S | 3.41 GB | Download |
| LongWriter-llama3.1-8b-Q4_0.gguf | GGUF | — | 4.35 GB | Download |
| LongWriter-llama3.1-8b-Q4_1.gguf | GGUF | — | 4.78 GB | Download |
| LongWriter-llama3.1-8b-Q4_K_M.gguf | GGUF | Q4_K_M | 4.58 GB | Download |
| LongWriter-llama3.1-8b-Q4_K_S.gguf | GGUF | Q4_K_S | 4.37 GB | Download |
| LongWriter-llama3.1-8b-Q5_0.gguf | GGUF | — | 5.23 GB | Download |
| LongWriter-llama3.1-8b-Q5_1.gguf | GGUF | — | 5.65 GB | Download |
| LongWriter-llama3.1-8b-Q5_K_M.gguf | GGUF | Q5_K_M | 5.34 GB | Download |
| LongWriter-llama3.1-8b-Q5_K_S.gguf | GGUF | Q5_K_S | 5.21 GB | Download |
| LongWriter-llama3.1-8b-Q6_K.gguf | GGUF | Q6_K | 6.14 GB | Download |
| LongWriter-llama3.1-8b-Q8_0.gguf | GGUF | — | 7.95 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "other",
"language": [
"en"
],
"pipeline_tag": "text-generation",
"inference": false,
"tags": [
"transformers",
"gguf",
"imatrix",
"LongWriter-llama3.1-8b"
],
"frontmatter": {
"license": "other",
"language": [
"en"
],
"pipeline_tag": "text-generation",
"inference": "false",
"tags": [
"transformers",
"gguf",
"imatrix",
"LongWriter-llama3.1-8b"
]
},
"hero_image_url": "",
"summary": "LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\\n{system prompt}\\n>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: ``python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\") model = model.eval() query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device) context_length = input.input_ids.shape[-1] output = model.generate( **input, max_new_tokens=32768, num_beams=1, do_sample=True, temperature=0.5, )[0] response = tokenizer.decode(output[context_length:], skip_special_tokens=True) print(response) ` You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code: `python model = LLM( model= \"THUDM/LongWriter-llama3.1-8b\", dtype=\"auto\", trust_remote_code=True, tensor_parallel_size=1, max_model_len=32768, gpu_memory_utilization=0.5, ) tokenizer = model.get_tokenizer() generation_params = SamplingParams( temperature=0.5, top_p=0.8, top_k=50, max_tokens=32768, repetition_penalty=1, ) query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist() outputs = model.generate( sampling_params=generation_params, prompt_token_ids=[input_ids], ) output = outputs[0] print(output.outputs[0].text) ``",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- LongWriter-llama3.1-8b\n---\nQuantizations of https://huggingface.co/THUDM/LongWriter-llama3.1-8b\n\n\n### Inference Clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [ollama](https://github.com/ollama/ollama)\n\n\n---\n\n# From original readme\n\nLongWriter-llama3.1-8b is trained based on [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and is capable of generating 10,000+ words at once.\n\nEnvironment: `transformers>=4.43.0`\n\nPlease ahere to the prompt template (system prompt is optional): `<<SYS>>\\n{system prompt}\\n<</SYS>>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}...`\n\nA simple demo for deployment of the model:\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\nmodel = model.eval()\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device)\ncontext_length = input.input_ids.shape[-1]\noutput = model.generate(\n **input,\n max_new_tokens=32768,\n num_beams=1,\n do_sample=True,\n temperature=0.5,\n)[0]\nresponse = tokenizer.decode(output[context_length:], skip_special_tokens=True)\nprint(response)\n```\nYou can also deploy the model with [vllm](https://github.com/vllm-project/vllm), which allows 10,000+ words generation within a minute. Here is an example code:\n```python\nmodel = LLM(\n model= \"THUDM/LongWriter-llama3.1-8b\",\n dtype=\"auto\",\n trust_remote_code=True,\n tensor_parallel_size=1,\n max_model_len=32768,\n gpu_memory_utilization=0.5,\n)\ntokenizer = model.get_tokenizer()\ngeneration_params = SamplingParams(\n temperature=0.5,\n top_p=0.8,\n top_k=50,\n max_tokens=32768,\n repetition_penalty=1,\n)\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist()\noutputs = model.generate(\n sampling_params=generation_params,\n prompt_token_ids=[input_ids],\n)\noutput = outputs[0]\nprint(output.outputs[0].text)\n```",
"related_quantizations": []
},
"tags": [
"transformers",
"gguf",
"imatrix",
"LongWriter-llama3.1-8b",
"text-generation",
"en",
"license:other",
"region:us"
],
"likes": 1,
"downloads": 139,
"gated": false,
"private": false,
"last_modified": "2024-10-05T18:48:19.000Z",
"created_at": "2024-10-05T15:56:44.000Z",
"pipeline_tag": "text-generation",
"library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "670161bc2fa99176350335a0",
"id": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
"modelId": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
"sha": "d877d3921384e0eb28591169355ddeb41a78a5e3",
"createdAt": "2024-10-05T15:56:44.000Z",
"lastModified": "2024-10-05T18:48:19.000Z",
"author": "duyntnet",
"downloads": 139,
"likes": 1,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "transformers",
"siblings_count": 29
}