duyntnet/longwriter-llama3.1-8b-imatrix-gguf IQ3_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

duyntnet/longwriter-llama3.1-8b-imatrix-gguf overview

LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\n{system prompt}\n>\n\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code:

transformersggufimatrixLongWriter-llama3.1-8btext-generationenlicense:otherregion:us

duyntnet/longwriter-llama3.1-8b-imatrix-gguf visual

Downloads

139

Likes

Pipeline

text-generation

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

27 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
LongWriter-llama3.1-8b-IQ1_M.gguf	GGUF	IQ1_M	2.01 GB	Download
LongWriter-llama3.1-8b-IQ1_S.gguf	GGUF	IQ1_S	1.88 GB	Download
LongWriter-llama3.1-8b-IQ2_M.gguf	GGUF	IQ2_M	2.75 GB	Download
LongWriter-llama3.1-8b-IQ2_S.gguf	GGUF	IQ2_S	2.57 GB	Download
LongWriter-llama3.1-8b-IQ2_XS.gguf	GGUF	IQ2_XS	2.43 GB	Download
LongWriter-llama3.1-8b-IQ2_XXS.gguf	GGUF	IQ2_XXS	2.23 GB	Download
LongWriter-llama3.1-8b-IQ3_M.gguf	GGUF	IQ3_M	3.52 GB	Download
LongWriter-llama3.1-8b-IQ3_S.gguf	GGUF	IQ3_S	3.43 GB	Download
LongWriter-llama3.1-8b-IQ3_XS.gguf	GGUF	IQ3_XS	3.28 GB	Download
LongWriter-llama3.1-8b-IQ3_XXS.gguf	GGUF	IQ3_XXS	3.05 GB	Download
LongWriter-llama3.1-8b-IQ4_NL.gguf	GGUF	IQ4_NL	4.36 GB	Download
LongWriter-llama3.1-8b-IQ4_XS.gguf	GGUF	IQ4_XS	4.14 GB	Download
LongWriter-llama3.1-8b-Q2_K.gguf	GGUF	Q2_K	2.96 GB	Download
LongWriter-llama3.1-8b-Q2_K_S.gguf	GGUF	Q2_K_S	2.78 GB	Download
LongWriter-llama3.1-8b-Q3_K_L.gguf	GGUF	Q3_K_L	4.03 GB	Download
LongWriter-llama3.1-8b-Q3_K_M.gguf	GGUF	Q3_K_M	3.74 GB	Download
LongWriter-llama3.1-8b-Q3_K_S.gguf	GGUF	Q3_K_S	3.41 GB	Download
LongWriter-llama3.1-8b-Q4_0.gguf	GGUF	—	4.35 GB	Download
LongWriter-llama3.1-8b-Q4_1.gguf	GGUF	—	4.78 GB	Download
LongWriter-llama3.1-8b-Q4_K_M.gguf	GGUF	Q4_K_M	4.58 GB	Download
LongWriter-llama3.1-8b-Q4_K_S.gguf	GGUF	Q4_K_S	4.37 GB	Download
LongWriter-llama3.1-8b-Q5_0.gguf	GGUF	—	5.23 GB	Download
LongWriter-llama3.1-8b-Q5_1.gguf	GGUF	—	5.65 GB	Download
LongWriter-llama3.1-8b-Q5_K_M.gguf	GGUF	Q5_K_M	5.34 GB	Download
LongWriter-llama3.1-8b-Q5_K_S.gguf	GGUF	Q5_K_S	5.21 GB	Download
LongWriter-llama3.1-8b-Q6_K.gguf	GGUF	Q6_K	6.14 GB	Download
LongWriter-llama3.1-8b-Q8_0.gguf	GGUF	—	7.95 GB	Download

Model Details Live

Model Slug

duyntnet/longwriter-llama3.1-8b-imatrix-gguf

Author

duyntnet

Pipeline Task

text-generation

Library

transformers

Created

2024-10-05

Last Modified

2024-10-05

Gated

Private

HF SHA

d877d3921384e0eb28591169355ddeb41a78a5e3

License

other

Language

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "other",
    "language": [
      "en"
    ],
    "pipeline_tag": "text-generation",
    "inference": false,
    "tags": [
      "transformers",
      "gguf",
      "imatrix",
      "LongWriter-llama3.1-8b"
    ],
    "frontmatter": {
      "license": "other",
      "language": [
        "en"
      ],
      "pipeline_tag": "text-generation",
      "inference": "false",
      "tags": [
        "transformers",
        "gguf",
        "imatrix",
        "LongWriter-llama3.1-8b"
      ]
    },
    "hero_image_url": "",
    "summary": "LongWriter-llama3.1-8b is trained based on Meta-Llama-3.1-8B, and is capable of generating 10,000+ words at once. Environment: transformers>=4.43.0 Please ahere to the prompt template (system prompt is optional): >\\n{system prompt}\\n>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}... A simple demo for deployment of the model: ``python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\") model = model.eval() query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device) context_length = input.input_ids.shape[-1] output = model.generate( **input, max_new_tokens=32768, num_beams=1, do_sample=True, temperature=0.5, )[0] response = tokenizer.decode(output[context_length:], skip_special_tokens=True) print(response) ` You can also deploy the model with vllm, which allows 10,000+ words generation within a minute. Here is an example code: `python model = LLM( model= \"THUDM/LongWriter-llama3.1-8b\", dtype=\"auto\", trust_remote_code=True, tensor_parallel_size=1, max_model_len=32768, gpu_memory_utilization=0.5, ) tokenizer = model.get_tokenizer() generation_params = SamplingParams( temperature=0.5, top_p=0.8, top_k=50, max_tokens=32768, repetition_penalty=1, ) query = \"Write a 10000-word China travel guide\" prompt = f\"[INST]{query}[/INST]\" input_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist() outputs = model.generate( sampling_params=generation_params, prompt_token_ids=[input_ids], ) output = outputs[0] print(output.outputs[0].text) ``",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- LongWriter-llama3.1-8b\n---\nQuantizations of https://huggingface.co/THUDM/LongWriter-llama3.1-8b\n\n\n### Inference Clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [ollama](https://github.com/ollama/ollama)\n\n\n---\n\n# From original readme\n\nLongWriter-llama3.1-8b is trained based on [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and is capable of generating 10,000+ words at once.\n\nEnvironment: `transformers>=4.43.0`\n\nPlease ahere to the prompt template (system prompt is optional): `<<SYS>>\\n{system prompt}\\n<</SYS>>\\n\\n[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}...`\n\nA simple demo for deployment of the model:\n```python\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"THUDM/LongWriter-llama3.1-8b\", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=\"auto\")\nmodel = model.eval()\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput = tokenizer(prompt, truncation=False, return_tensors=\"pt\").to(device)\ncontext_length = input.input_ids.shape[-1]\noutput = model.generate(\n    **input,\n    max_new_tokens=32768,\n    num_beams=1,\n    do_sample=True,\n    temperature=0.5,\n)[0]\nresponse = tokenizer.decode(output[context_length:], skip_special_tokens=True)\nprint(response)\n```\nYou can also deploy the model with [vllm](https://github.com/vllm-project/vllm), which allows 10,000+ words generation within a minute. Here is an example code:\n```python\nmodel = LLM(\n    model= \"THUDM/LongWriter-llama3.1-8b\",\n    dtype=\"auto\",\n    trust_remote_code=True,\n    tensor_parallel_size=1,\n    max_model_len=32768,\n    gpu_memory_utilization=0.5,\n)\ntokenizer = model.get_tokenizer()\ngeneration_params = SamplingParams(\n    temperature=0.5,\n    top_p=0.8,\n    top_k=50,\n    max_tokens=32768,\n    repetition_penalty=1,\n)\nquery = \"Write a 10000-word China travel guide\"\nprompt = f\"[INST]{query}[/INST]\"\ninput_ids = tokenizer(prompt, truncation=False, return_tensors=\"pt\").input_ids[0].tolist()\noutputs = model.generate(\n    sampling_params=generation_params,\n    prompt_token_ids=[input_ids],\n)\noutput = outputs[0]\nprint(output.outputs[0].text)\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "imatrix",
    "LongWriter-llama3.1-8b",
    "text-generation",
    "en",
    "license:other",
    "region:us"
  ],
  "likes": 1,
  "downloads": 139,
  "gated": false,
  "private": false,
  "last_modified": "2024-10-05T18:48:19.000Z",
  "created_at": "2024-10-05T15:56:44.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "670161bc2fa99176350335a0",
  "id": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
  "modelId": "duyntnet/LongWriter-llama3.1-8b-imatrix-GGUF",
  "sha": "d877d3921384e0eb28591169355ddeb41a78a5e3",
  "createdAt": "2024-10-05T15:56:44.000Z",
  "lastModified": "2024-10-05T18:48:19.000Z",
  "author": "duyntnet",
  "downloads": 139,
  "likes": 1,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 29
}

duyntnet/longwriter-llama3.1-8b-imatrix-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard