GraySoft
Projects Models About FAQ Contact Download guIDE →

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k Q5_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k overview

Comprehensive model page for dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k

ggufjaendpoints_compatibleregion:usimatrixconversational
dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k visual
Downloads
417
Likes
8
Pipeline
Library
Visibility
Public
Access
Open

Repository Files & Downloads

22 files detected
Direct downloads for all repository files
FileTypeQuantizationSizeLink
Qwen2.5-3B-Instruct-IQ3_M.gguf GGUF IQ3_M 1.39 GB Download
Qwen2.5-3B-Instruct-IQ3_XS.gguf GGUF IQ3_XS 1.30 GB Download
Qwen2.5-3B-Instruct-IQ3_XXS.gguf GGUF IQ3_XXS 1.19 GB Download
Qwen2.5-3B-Instruct-IQ4_XS.gguf GGUF IQ4_XS 1.62 GB Download
Qwen2.5-3B-Instruct-Q3_K-f16.gguf GGUF Q3_K 1.82 GB Download
Qwen2.5-3B-Instruct-Q3_K_L.gguf GGUF Q3_K_L 1.55 GB Download
Qwen2.5-3B-Instruct-Q3_K_M.gguf GGUF Q3_K_M 1.48 GB Download
Qwen2.5-3B-Instruct-Q3_K_S.gguf GGUF Q3_K_S 1.35 GB Download
Qwen2.5-3B-Instruct-Q4_K-f16.gguf GGUF Q4_K 2.14 GB Download
Qwen2.5-3B-Instruct-Q4_K_L.gguf GGUF Q4_K_L 1.87 GB Download
Qwen2.5-3B-Instruct-Q4_K_M.gguf GGUF Q4_K_M 1.80 GB Download
Qwen2.5-3B-Instruct-Q4_K_S.gguf GGUF Q4_K_S 1.71 GB Download
Qwen2.5-3B-Instruct-Q5_K-f16.gguf GGUF Q5_K 2.41 GB Download
Qwen2.5-3B-Instruct-Q5_K_L.gguf GGUF Q5_K_L 2.14 GB Download
Qwen2.5-3B-Instruct-Q5_K_M.gguf GGUF Q5_K_M 2.07 GB Download
Qwen2.5-3B-Instruct-Q5_K_S.gguf GGUF Q5_K_S 2.02 GB Download
Qwen2.5-3B-Instruct-Q6_K-f16.gguf GGUF Q6_K 2.71 GB Download
Qwen2.5-3B-Instruct-Q6_K.gguf GGUF Q6_K 2.36 GB Download
Qwen2.5-3B-Instruct-Q6_K_L.gguf GGUF Q6_K_L 2.43 GB Download
Qwen2.5-3B-Instruct-Q8_0-f16.gguf GGUF F16 3.33 GB Download
Qwen2.5-3B-Instruct-Q8_0.gguf GGUF 3.33 GB Download
Qwen2.5-3B-Instruct-Q8_0_L.gguf GGUF 3.06 GB Download

Model Details Live

Model Slug
dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k
Author
dahara1
Pipeline Task
Library
Created
2024-11-15
Last Modified
2024-11-17
Gated
No
Private
No
HF SHA
8786c5d17874fbc318699c1a119f23326769ec92
License
Unknown
Language
ja
Base Model
Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)
{
  "metadata": {},
  "card_data": {
    "language": [
      "ja"
    ],
    "frontmatter": {
      "language": [
        "ja"
      ]
    },
    "hero_image_url": "128k_full_instruct_first.png",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlanguage:\n- ja\n---\n\n## 本モデルについて about this model.\n[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](https://huggingface.co/dahara1/imatrix-jpn-test)を使って量子化し、超長文(32K以上)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。  \nThis is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](https://huggingface.co/dahara1/imatrix-jpn-test) to enable summarization of long texts (over 32K). We hope that it retains a large amount of Japanese support.\n\n少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。  \nIt has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens.  \n\n128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF)の指摘を参考にしています。ありがとう。  \nRegarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF). Thank you.  \n\n\n## For ollama users\nollama ユーザーは[FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md)を参考にしてcontext window sizeパラメーターを修正してください。  \nIf you use ollama, check [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md) and set context window size parameter like below.\n\n```\n/set parameter num_ctx 40960\n```\nor API \n```\ncurl http://..../api/generate -d '{\n  \"model\": \".....\",\n  \"prompt\": \"......\",\n  \"options\": {\n    \"num_ctx\": 40960\n  }\n}'\n```\n\nあなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください  \nただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください  \n本モデルは理論上、最大値128K(131072)に設定できますが、実行速度と品質に影響が出る事が考えられます  \n\nIf you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool.  \nBut please note that increasing the context window size more than necessary will slow down the model's execution speed.  \nIn theory, this model can be set to the maximum value of 128K(131072), but this may affect execution speed and quality.  \n\n\n## Sample llama.cpp script\n\n以下は、Wikipediaの約50,000文字(34.8Kトークン)の記事を取得して内容を要約するサンプルです  \nBelow is a sample that retrieves a Wikipedia article of about 50,000 Japanese characters(34.8K tokens) and summarizes its contents.  \n\n\nllama.cpp server command sample.\n```\n./llama.cpp/build/bin/Release/llama-server.exe -m ./Qwen2.5-3B-Instruct-Q8_0-f16.gguf -c 40960\n```\n\n\nllama.cpp client script sample.\n```\nimport transformers\nimport requests\nimport json\nfrom transformers import AutoTokenizer\ntokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-3B-Instruct\")\n\nurl = \"https://ja.wikipedia.org/wiki/%E7%94%B7%E3%81%AE%E5%A8%98\"\n\ndef get_wikipedia_text(url):\n    response = requests.get(url)\n    if response.status_code == 200:\n        from bs4 import BeautifulSoup\n        soup = BeautifulSoup(response.text, 'html.parser')\n        paragraphs = soup.find_all('p')\n        text = \"\\n\".join([p.get_text() for p in paragraphs])\n        return text\n    else:\n        raise Exception(f\"Failed to fetch the article. Status code: {response.status_code}\")\n        \nif __name__ == \"__main__\":\n\n    html_text = get_wikipedia_text(url)\n    #html_text = html_text[:40000]\n\n    instruct = \"### 指示\\n\\n日本語で3行で要約してください\"\n\n    # instruct first version\n    messages  = [\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": instruct + \"\\n\\n\" + html_text},\n    ]\n\n    # instruct last version\n    messages  = [\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": html_text  + \"\\n\\n\" + instruct},\n    ]\n    \n    prompt = tokenizer.apply_chat_template(\n            messages,\n            add_generation_prompt=True,\n            tokenize=False\n    )\n    print(prompt)\n\n    payload = {\n            \"prompt\": prompt,\n            \"n_predict\": 512\n    }\n\n    url = \"http://localhost:8080/completion\"\n    headers = {\n        \"Content-Type\": \"application/json\"\n    }\n\n    response = requests.post(url, headers=headers, data=json.dumps(payload))\n    if response.status_code != 200:\n        print(f\"Error: {response.text}\")\n\n    response_data = response.json()\n\n    response_content = response_data.get('content', '').strip()\n    print(response_content)\n```\n\n### 出力結果(output sample)\n\n#### This 128K model\n128K instruct first version  \n![128K instruct first version](128k_full_instruct_first.png)\n\n128K instruct last version  \n![128K instruct last version](128k_full_instruct_last.png)\n\n#### Standard 32K model\n32K instruct first version  \n![32K instruct first version](32k_full_instruct_first.png)\n\n32K instruct last version  \n![32K instruct last version](32k_full_instruct_last.png)\n\n\n32K instruct first versionでは要約指示がコンテキスト外になっており、指示が無視されている事に注目してください。  \nNotice that in the 32K instruct first version the summary instruction is out of context and the instruction is ignored.  \n\n32K instruct last versionも記事冒頭部分がコンテキスト外になっているため、用語解説の視点が弱まっています。\nThe 32K instruct last version also has the beginning of the article out of context, weakening the perspective of the terminology explanation.  \n\n\n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "ja",
    "endpoints_compatible",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 8,
  "downloads": 417,
  "gated": false,
  "private": false,
  "last_modified": "2024-11-17T11:43:29.000Z",
  "created_at": "2024-11-15T02:50:24.000Z",
  "pipeline_tag": "",
  "library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
  "_id": "6736b6f07506985f27b5626c",
  "id": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
  "modelId": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
  "sha": "8786c5d17874fbc318699c1a119f23326769ec92",
  "createdAt": "2024-11-15T02:50:24.000Z",
  "lastModified": "2024-11-17T11:43:29.000Z",
  "author": "dahara1",
  "downloads": 417,
  "likes": 8,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 31
}