GraySoft
Projects Models About FAQ Contact Download guIDE →

duyntnet/longwriter-llama3.1-8b-imatrix-gguf IQ3_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

duyntnet/longwriter-llama3.1-8b-imatrix-gguf overview

LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\n{system prompt}\n>\n\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code:

transformersggufimatrixLongWriter-llama3.1-8btext-generationenlicense:otherregion:us
duyntnet/longwriter-llama3.1-8b-imatrix-gguf visual
Downloads
139
Likes
1
Pipeline
text-generation
Library
transformers
Visibility
Public
Access
Open

Repository Files & Downloads

27 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
LongWriter-llama3.1-8b-IQ1_M.gguf GGUF IQ1_M 2.01 GB Download
LongWriter-llama3.1-8b-IQ1_S.gguf GGUF IQ1_S 1.88 GB Download
LongWriter-llama3.1-8b-IQ2_M.gguf GGUF IQ2_M 2.75 GB Download
LongWriter-llama3.1-8b-IQ2_S.gguf GGUF IQ2_S 2.57 GB Download
LongWriter-llama3.1-8b-IQ2_XS.gguf GGUF IQ2_XS 2.43 GB Download
LongWriter-llama3.1-8b-IQ2_XXS.gguf GGUF IQ2_XXS 2.23 GB Download
LongWriter-llama3.1-8b-IQ3_M.gguf GGUF IQ3_M 3.52 GB Download
LongWriter-llama3.1-8b-IQ3_S.gguf GGUF IQ3_S 3.43 GB Download
LongWriter-llama3.1-8b-IQ3_XS.gguf GGUF IQ3_XS 3.28 GB Download
LongWriter-llama3.1-8b-IQ3_XXS.gguf GGUF IQ3_XXS 3.05 GB Download
LongWriter-llama3.1-8b-IQ4_NL.gguf GGUF IQ4_NL 4.36 GB Download
LongWriter-llama3.1-8b-IQ4_XS.gguf GGUF IQ4_XS 4.14 GB Download
LongWriter-llama3.1-8b-Q2_K.gguf GGUF Q2_K 2.96 GB Download
LongWriter-llama3.1-8b-Q2_K_S.gguf GGUF Q2_K_S 2.78 GB Download
LongWriter-llama3.1-8b-Q3_K_L.gguf GGUF Q3_K_L 4.03 GB Download
LongWriter-llama3.1-8b-Q3_K_M.gguf GGUF Q3_K_M 3.74 GB Download
LongWriter-llama3.1-8b-Q3_K_S.gguf GGUF Q3_K_S 3.41 GB Download
LongWriter-llama3.1-8b-Q4_0.gguf GGUF 4.35 GB Download
LongWriter-llama3.1-8b-Q4_1.gguf GGUF 4.78 GB Download
LongWriter-llama3.1-8b-Q4_K_M.gguf GGUF Q4_K_M 4.58 GB Download
LongWriter-llama3.1-8b-Q4_K_S.gguf GGUF Q4_K_S 4.37 GB Download
LongWriter-llama3.1-8b-Q5_0.gguf GGUF 5.23 GB Download
LongWriter-llama3.1-8b-Q5_1.gguf GGUF 5.65 GB Download
LongWriter-llama3.1-8b-Q5_K_M.gguf GGUF Q5_K_M 5.34 GB Download
LongWriter-llama3.1-8b-Q5_K_S.gguf GGUF Q5_K_S 5.21 GB Download
LongWriter-llama3.1-8b-Q6_K.gguf GGUF Q6_K 6.14 GB Download
LongWriter-llama3.1-8b-Q8_0.gguf GGUF 7.95 GB Download

Model Details Live

Model Slug
duyntnet/longwriter-llama3.1-8b-imatrix-gguf
Author
duyntnet
Pipeline Task
text-generation
Library
transformers
Created
2024-10-05
Last Modified
2024-10-05
Gated
No
Private
No
HF SHA
d877d3921384e0eb28591169355ddeb41a78a5e3
License
other
Language
en
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "license": "other",
    "language": [
      "en"
    ],
    "pipeline_tag": "text-generation",
    "inference": false,
    "tags": [
      "transformers",
      "gguf",
      "imatrix",
      "LongWriter-llama3.1-8b"
    ],
    "frontmatter": {
      "license": "other",
      "language": [
        "en"
      ],
      "pipeline_tag": "text-generation",
      "inference": "false",
      "tags": [
        "transformers",
        "gguf",
        "imatrix",
        "LongWriter-llama3.1-8b"
      ]
    },
    "hero_image_url": "",
    "summary": "LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\\n{system prompt}\\n>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: ``python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\") model = model.eval() query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device) context_length = input.input_ids.shape[-1] output = model.generate( **input, max_new_tokens=32768, num_beams=1, do_sample=True, temperature=0.5, )[0] response = tokenizer.decode(output[context_length:], skip_special_tokens=True) print(response) ` You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code: `python model = LLM( model= \"THUDM/LongWriter-llama3.1-8b\", dtype=\"auto\", trust_remote_code=True, tensor_parallel_size=1, max_model_len=32768, gpu_memory_utilization=0.5, ) tokenizer = model.get_tokenizer() generation_params = SamplingParams( temperature=0.5, top_p=0.8, top_k=50, max_tokens=32768, repetition_penalty=1, ) query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist() outputs = model.generate( sampling_params=generation_params, prompt_token_ids=[input_ids], ) output = outputs[0] print(output.outputs[0].text) ``",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- LongWriter-llama3.1-8b\n---\nQuantizations of https://huggingface.co/THUDM/LongWriter-llama3.1-8b\n\n\n### Inference Clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [ollama](https://github.com/ollama/ollama)\n\n\n---\n\n# From original readme\n\nLongWriter-llama3.1-8b is trained based on [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and is capable of generating 10,000+ words at once.\n\nEnvironment: `transformers>=4.43.0`\n\nPlease ahere to the prompt template (system prompt is optional): `<<SYS>>\\n{system prompt}\\n<</SYS>>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}...`\n\nA simple demo for deployment of the model:\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\nmodel = model.eval()\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device)\ncontext_length = input.input_ids.shape[-1]\noutput = model.generate(\n    **input,\n    max_new_tokens=32768,\n    num_beams=1,\n    do_sample=True,\n    temperature=0.5,\n)[0]\nresponse = tokenizer.decode(output[context_length:], skip_special_tokens=True)\nprint(response)\n```\nYou can also deploy the model with [vllm](https://github.com/vllm-project/vllm), which allows 10,000+ words generation within a minute. Here is an example code:\n```python\nmodel = LLM(\n    model= \"THUDM/LongWriter-llama3.1-8b\",\n    dtype=\"auto\",\n    trust_remote_code=True,\n    tensor_parallel_size=1,\n    max_model_len=32768,\n    gpu_memory_utilization=0.5,\n)\ntokenizer = model.get_tokenizer()\ngeneration_params = SamplingParams(\n    temperature=0.5,\n    top_p=0.8,\n    top_k=50,\n    max_tokens=32768,\n    repetition_penalty=1,\n)\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist()\noutputs = model.generate(\n    sampling_params=generation_params,\n    prompt_token_ids=[input_ids],\n)\noutput = outputs[0]\nprint(output.outputs[0].text)\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "imatrix",
    "LongWriter-llama3.1-8b",
    "text-generation",
    "en",
    "license:other",
    "region:us"
  ],
  "likes": 1,
  "downloads": 139,
  "gated": false,
  "private": false,
  "last_modified": "2024-10-05T18:48:19.000Z",
  "created_at": "2024-10-05T15:56:44.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "670161bc2fa99176350335a0",
  "id": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
  "modelId": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
  "sha": "d877d3921384e0eb28591169355ddeb41a78a5e3",
  "createdAt": "2024-10-05T15:56:44.000Z",
  "lastModified": "2024-10-05T18:48:19.000Z",
  "author": "duyntnet",
  "downloads": 139,
  "likes": 1,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 29
}