dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k Q5_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k overview
Comprehensive model page for dahara1/qwen2.5-3b-instruct-gguf-japanese-imatrix-128k
Downloads
417
Likes
8
Pipeline
—
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
22 files detected
Direct downloads for all repository files
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Qwen2.5-3B-Instruct-IQ3_M.gguf | GGUF | IQ3_M | 1.39 GB | Download |
| Qwen2.5-3B-Instruct-IQ3_XS.gguf | GGUF | IQ3_XS | 1.30 GB | Download |
| Qwen2.5-3B-Instruct-IQ3_XXS.gguf | GGUF | IQ3_XXS | 1.19 GB | Download |
| Qwen2.5-3B-Instruct-IQ4_XS.gguf | GGUF | IQ4_XS | 1.62 GB | Download |
| Qwen2.5-3B-Instruct-Q3_K-f16.gguf | GGUF | Q3_K | 1.82 GB | Download |
| Qwen2.5-3B-Instruct-Q3_K_L.gguf | GGUF | Q3_K_L | 1.55 GB | Download |
| Qwen2.5-3B-Instruct-Q3_K_M.gguf | GGUF | Q3_K_M | 1.48 GB | Download |
| Qwen2.5-3B-Instruct-Q3_K_S.gguf | GGUF | Q3_K_S | 1.35 GB | Download |
| Qwen2.5-3B-Instruct-Q4_K-f16.gguf | GGUF | Q4_K | 2.14 GB | Download |
| Qwen2.5-3B-Instruct-Q4_K_L.gguf | GGUF | Q4_K_L | 1.87 GB | Download |
| Qwen2.5-3B-Instruct-Q4_K_M.gguf | GGUF | Q4_K_M | 1.80 GB | Download |
| Qwen2.5-3B-Instruct-Q4_K_S.gguf | GGUF | Q4_K_S | 1.71 GB | Download |
| Qwen2.5-3B-Instruct-Q5_K-f16.gguf | GGUF | Q5_K | 2.41 GB | Download |
| Qwen2.5-3B-Instruct-Q5_K_L.gguf | GGUF | Q5_K_L | 2.14 GB | Download |
| Qwen2.5-3B-Instruct-Q5_K_M.gguf | GGUF | Q5_K_M | 2.07 GB | Download |
| Qwen2.5-3B-Instruct-Q5_K_S.gguf | GGUF | Q5_K_S | 2.02 GB | Download |
| Qwen2.5-3B-Instruct-Q6_K-f16.gguf | GGUF | Q6_K | 2.71 GB | Download |
| Qwen2.5-3B-Instruct-Q6_K.gguf | GGUF | Q6_K | 2.36 GB | Download |
| Qwen2.5-3B-Instruct-Q6_K_L.gguf | GGUF | Q6_K_L | 2.43 GB | Download |
| Qwen2.5-3B-Instruct-Q8_0-f16.gguf | GGUF | F16 | 3.33 GB | Download |
| Qwen2.5-3B-Instruct-Q8_0.gguf | GGUF | — | 3.33 GB | Download |
| Qwen2.5-3B-Instruct-Q8_0_L.gguf | GGUF | — | 3.06 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"language": [
"ja"
],
"frontmatter": {
"language": [
"ja"
]
},
"hero_image_url": "128k_full_instruct_first.png",
"summary": "",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlanguage:\n- ja\n---\n\n## 本モデルについて about this model.\n[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](https://huggingface.co/dahara1/imatrix-jpn-test)を使って量子化し、超長文(32K以上)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。 \nThis is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](https://huggingface.co/dahara1/imatrix-jpn-test) to enable summarization of long texts (over 32K). We hope that it retains a large amount of Japanese support.\n\n少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。 \nIt has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens. \n\n128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF)の指摘を参考にしています。ありがとう。 \nRegarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF). Thank you. \n\n\n## For ollama users\nollama ユーザーは[FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md)を参考にしてcontext window sizeパラメーターを修正してください。 \nIf you use ollama, check [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md) and set context window size parameter like below.\n\n```\n/set parameter num_ctx 40960\n```\nor API \n```\ncurl http://..../api/generate -d '{\n \"model\": \".....\",\n \"prompt\": \"......\",\n \"options\": {\n \"num_ctx\": 40960\n }\n}'\n```\n\nあなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください \nただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください \n本モデルは理論上、最大値128K(131072)に設定できますが、実行速度と品質に影響が出る事が考えられます \n\nIf you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool. \nBut please note that increasing the context window size more than necessary will slow down the model's execution speed. \nIn theory, this model can be set to the maximum value of 128K(131072), but this may affect execution speed and quality. \n\n\n## Sample llama.cpp script\n\n以下は、Wikipediaの約50,000文字(34.8Kトークン)の記事を取得して内容を要約するサンプルです \nBelow is a sample that retrieves a Wikipedia article of about 50,000 Japanese characters(34.8K tokens) and summarizes its contents. \n\n\nllama.cpp server command sample.\n```\n./llama.cpp/build/bin/Release/llama-server.exe -m ./Qwen2.5-3B-Instruct-Q8_0-f16.gguf -c 40960\n```\n\n\nllama.cpp client script sample.\n```\nimport transformers\nimport requests\nimport json\nfrom transformers import AutoTokenizer\ntokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-3B-Instruct\")\n\nurl = \"https://ja.wikipedia.org/wiki/%E7%94%B7%E3%81%AE%E5%A8%98\"\n\ndef get_wikipedia_text(url):\n response = requests.get(url)\n if response.status_code == 200:\n from bs4 import BeautifulSoup\n soup = BeautifulSoup(response.text, 'html.parser')\n paragraphs = soup.find_all('p')\n text = \"\\n\".join([p.get_text() for p in paragraphs])\n return text\n else:\n raise Exception(f\"Failed to fetch the article. Status code: {response.status_code}\")\n \nif __name__ == \"__main__\":\n\n html_text = get_wikipedia_text(url)\n #html_text = html_text[:40000]\n\n instruct = \"### 指示\\n\\n日本語で3行で要約してください\"\n\n # instruct first version\n messages = [\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": instruct + \"\\n\\n\" + html_text},\n ]\n\n # instruct last version\n messages = [\n {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n {\"role\": \"user\", \"content\": html_text + \"\\n\\n\" + instruct},\n ]\n \n prompt = tokenizer.apply_chat_template(\n messages,\n add_generation_prompt=True,\n tokenize=False\n )\n print(prompt)\n\n payload = {\n \"prompt\": prompt,\n \"n_predict\": 512\n }\n\n url = \"http://localhost:8080/completion\"\n headers = {\n \"Content-Type\": \"application/json\"\n }\n\n response = requests.post(url, headers=headers, data=json.dumps(payload))\n if response.status_code != 200:\n print(f\"Error: {response.text}\")\n\n response_data = response.json()\n\n response_content = response_data.get('content', '').strip()\n print(response_content)\n```\n\n### 出力結果(output sample)\n\n#### This 128K model\n128K instruct first version \n\n\n128K instruct last version \n\n\n#### Standard 32K model\n32K instruct first version \n\n\n32K instruct last version \n\n\n\n32K instruct first versionでは要約指示がコンテキスト外になっており、指示が無視されている事に注目してください。 \nNotice that in the 32K instruct first version the summary instruction is out of context and the instruction is ignored. \n\n32K instruct last versionも記事冒頭部分がコンテキスト外になっているため、用語解説の視点が弱まっています。\nThe 32K instruct last version also has the beginning of the article out of context, weakening the perspective of the terminology explanation. \n\n\n",
"related_quantizations": []
},
"tags": [
"gguf",
"ja",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 8,
"downloads": 417,
"gated": false,
"private": false,
"last_modified": "2024-11-17T11:43:29.000Z",
"created_at": "2024-11-15T02:50:24.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "6736b6f07506985f27b5626c",
"id": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
"modelId": "dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K",
"sha": "8786c5d17874fbc318699c1a119f23326769ec92",
"createdAt": "2024-11-15T02:50:24.000Z",
"lastModified": "2024-11-17T11:43:29.000Z",
"author": "dahara1",
"downloads": 417,
"likes": 8,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 31
}