dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k Q5_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k overview

Comprehensive model page for dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k

ggufjaendpoints_compatibleregion:usimatrixconversational

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k visual

Downloads

417

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

22 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen2.5-3B-Instruct-IQ3_M.gguf	GGUF	IQ3_M	1.39 GB	Download
Qwen2.5-3B-Instruct-IQ3_XS.gguf	GGUF	IQ3_XS	1.30 GB	Download
Qwen2.5-3B-Instruct-IQ3_XXS.gguf	GGUF	IQ3_XXS	1.19 GB	Download
Qwen2.5-3B-Instruct-IQ4_XS.gguf	GGUF	IQ4_XS	1.62 GB	Download
Qwen2.5-3B-Instruct-Q3_K-f16.gguf	GGUF	Q3_K	1.82 GB	Download
Qwen2.5-3B-Instruct-Q3_K_L.gguf	GGUF	Q3_K_L	1.55 GB	Download
Qwen2.5-3B-Instruct-Q3_K_M.gguf	GGUF	Q3_K_M	1.48 GB	Download
Qwen2.5-3B-Instruct-Q3_K_S.gguf	GGUF	Q3_K_S	1.35 GB	Download
Qwen2.5-3B-Instruct-Q4_K-f16.gguf	GGUF	Q4_K	2.14 GB	Download
Qwen2.5-3B-Instruct-Q4_K_L.gguf	GGUF	Q4_K_L	1.87 GB	Download
Qwen2.5-3B-Instruct-Q4_K_M.gguf	GGUF	Q4_K_M	1.80 GB	Download
Qwen2.5-3B-Instruct-Q4_K_S.gguf	GGUF	Q4_K_S	1.71 GB	Download
Qwen2.5-3B-Instruct-Q5_K-f16.gguf	GGUF	Q5_K	2.41 GB	Download
Qwen2.5-3B-Instruct-Q5_K_L.gguf	GGUF	Q5_K_L	2.14 GB	Download
Qwen2.5-3B-Instruct-Q5_K_M.gguf	GGUF	Q5_K_M	2.07 GB	Download
Qwen2.5-3B-Instruct-Q5_K_S.gguf	GGUF	Q5_K_S	2.02 GB	Download
Qwen2.5-3B-Instruct-Q6_K-f16.gguf	GGUF	Q6_K	2.71 GB	Download
Qwen2.5-3B-Instruct-Q6_K.gguf	GGUF	Q6_K	2.36 GB	Download
Qwen2.5-3B-Instruct-Q6_K_L.gguf	GGUF	Q6_K_L	2.43 GB	Download
Qwen2.5-3B-Instruct-Q8_0-f16.gguf	GGUF	F16	3.33 GB	Download
Qwen2.5-3B-Instruct-Q8_0.gguf	GGUF	—	3.33 GB	Download
Qwen2.5-3B-Instruct-Q8_0_L.gguf	GGUF	—	3.06 GB	Download

Model Details Live

Model Slug

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k

Author

dahara1

Pipeline Task

—

Library

—

Created

2024-11-15

Last Modified

2024-11-17

Gated

Private

HF SHA

8786c5d17874fbc318699c1a119f23326769ec92

License

Unknown

Language

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "language": [
      "ja"
    ],
    "frontmatter": {
      "language": [
        "ja"
      ]
    },
    "hero_image_url": "128k_full_instruct_first.png",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlanguage:\n- ja\n---\n\n## 本モデルについて about this model.\n[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](https://huggingface.co/dahara1/imatrix-jpn-test)を使って量子化し、超長文(32K以上)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。  \nThis is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](https://huggingface.co/dahara1/imatrix-jpn-test) to enable summarization of long texts (over 32K). We hope that it retains a large amount of Japanese support.\n\n少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。  \nIt has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens.  \n\n128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF)の指摘を参考にしています。ありがとう。  \nRegarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF). Thank you.  \n\n\n## For ollama users\nollama ユーザーは[FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md)を参考にしてcontext window sizeパラメーターを修正してください。  \nIf you use ollama, check [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md) and set context window size parameter like below.\n\n```\n/set parameter num_ctx 40960\n```\nor API \n```\ncurl http://..../api/generate -d '{\n  \"model\": \".....\",\n  \"prompt\": \"......\",\n  \"options\": {\n    \"num_ctx\": 40960\n  }\n}'\n```\n\nあなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください  \nただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください  \n本モデルは理論上、最大値128K(131072)に設定できますが、実行速度と品質に影響が出る事が考えられます  \n\nIf you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool.  \nBut please note that increasing the context window size more than necessary will slow down the model's execution speed.  \nIn theory, this model can be set to the maximum value of 128K(131072), but this may affect execution speed and quality.  \n\n\n## Sample llama.cpp script\n\n以下は、Wikipediaの約50,000文字(34.8Kトークン)の記事を取得して内容を要約するサンプルです  \nBelow is a sample that retrieves a Wikipedia article of about 50,000 Japanese characters(34.8K tokens) and summarizes its contents.  \n\n\nllama.cpp server command sample.\n```\n./llama.cpp/build/bin/Release/llama-server.exe -m ./Qwen2.5-3B-Instruct-Q8_0-f16.gguf -c 40960\n```\n\n\nllama.cpp client script sample.\n```\nimport transformers\nimport requests\nimport json\nfrom transformers import AutoTokenizer\ntokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-3B-Instruct\")\n\nurl = \"https://ja.wikipedia.org/wiki/%E7%94%B7%E3%81%AE%E5%A8%98\"\n\ndef get_wikipedia_text(url):\n    response = requests.get(url)\n    if response.status_code == 200:\n        from bs4 import BeautifulSoup\n        soup = BeautifulSoup(response.text, 'html.parser')\n        paragraphs = soup.find_all('p')\n        text = \"\\n\".join([p.get_text() for p in paragraphs])\n        return text\n    else:\n        raise Exception(f\"Failed to fetch the article. Status code: {response.status_code}\")\n        \nif __name__ == \"__main__\":\n\n    html_text = get_wikipedia_text(url)\n    #html_text = html_text[:40000]\n\n    instruct = \"### 指示\\n\\n日本語で３行で要約してください\"\n\n    # instruct first version\n    messages  = [\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": instruct + \"\\n\\n\" + html_text},\n    ]\n\n    # instruct last version\n    messages  = [\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": html_text  + \"\\n\\n\" + instruct},\n    ]\n    \n    prompt = tokenizer.apply_chat_template(\n            messages,\n            add_generation_prompt=True,\n            tokenize=False\n    )\n    print(prompt)\n\n    payload = {\n            \"prompt\": prompt,\n            \"n_predict\": 512\n    }\n\n    url = \"http://localhost:8080/completion\"\n    headers = {\n        \"Content-Type\": \"application/json\"\n    }\n\n    response = requests.post(url, headers=headers, data=json.dumps(payload))\n    if response.status_code != 200:\n        print(f\"Error: {response.text}\")\n\n    response_data = response.json()\n\n    response_content = response_data.get('content', '').strip()\n    print(response_content)\n```\n\n### 出力結果(output sample)\n\n#### This 128K model\n128K instruct first version  \n![128K instruct first version](128k_full_instruct_first.png)\n\n128K instruct last version  \n![128K instruct last version](128k_full_instruct_last.png)\n\n#### Standard 32K model\n32K instruct first version  \n![32K instruct first version](32k_full_instruct_first.png)\n\n32K instruct last version  \n![32K instruct last version](32k_full_instruct_last.png)\n\n\n32K instruct first versionでは要約指示がコンテキスト外になっており、指示が無視されている事に注目してください。  \nNotice that in the 32K instruct first version the summary instruction is out of context and the instruction is ignored.  \n\n32K instruct last versionも記事冒頭部分がコンテキスト外になっているため、用語解説の視点が弱まっています。\nThe 32K instruct last version also has the beginning of the article out of context, weakening the perspective of the terminology explanation.　　\n\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "ja",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 8,
  "downloads": 417,
  "gated": false,
  "private": false,
  "last_modified": "2024-11-17T11:43:29.000Z",
  "created_at": "2024-11-15T02:50:24.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6736b6f07506985f27b5626c",
  "id": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
  "modelId": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
  "sha": "8786c5d17874fbc318699c1a119f23326769ec92",
  "createdAt": "2024-11-15T02:50:24.000Z",
  "lastModified": "2024-11-17T11:43:29.000Z",
  "author": "dahara1",
  "downloads": 417,
  "likes": 8,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 31
}

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard