duyntnet/faro-yi-9b-dpo-imatrix-gguf IQ3_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
duyntnet/faro-yi-9b-dpo-imatrix-gguf overview
This is the DPO version of wenbopan/Faro-Yi-9B. Compared to Faro-Yi-9B and Yi-9B-200K, the DPO model excels at many tasks, surpassing the original Yi-9B-200K by a large margin. On the Open LLM Leaderboard, it ranks #2 among all 9B models, #1 among all Yi-9B variants. | Metric | MMLU | GSM8K | hellaswag | truthfulqa | ai2_arc | winogrande | CMMLU | | ----------------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- | --------- | | Yi-9B-200K | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 | 71.97 | | Faro-Yi-9B | 68.80 | 63.08 | 57.28 | 40.86 | 72.58 | 71.11 | 73.28 | | Faro-Yi-9B-DPO | 69.98 | 66.11 | 59.04 | 48.01 | 75.68 | 73.40 | 75.23 | Faro-Yi-9B-DPO's responses are also favored by GPT-4 Judge in MT-Bench !image/png
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Faro-Yi-9B-DPO-IQ1_M.gguf | GGUF | IQ1_M | 2.03 GB | Download |
| Faro-Yi-9B-DPO-IQ1_S.gguf | GGUF | IQ1_S | 1.88 GB | Download |
| Faro-Yi-9B-DPO-IQ2_M.gguf | GGUF | IQ2_M | 2.89 GB | Download |
| Faro-Yi-9B-DPO-IQ2_S.gguf | GGUF | IQ2_S | 2.68 GB | Download |
| Faro-Yi-9B-DPO-IQ2_XS.gguf | GGUF | IQ2_XS | 2.52 GB | Download |
| Faro-Yi-9B-DPO-IQ2_XXS.gguf | GGUF | IQ2_XXS | 2.29 GB | Download |
| Faro-Yi-9B-DPO-IQ3_M.gguf | GGUF | IQ3_M | 3.78 GB | Download |
| Faro-Yi-9B-DPO-IQ3_S.gguf | GGUF | IQ3_S | 3.64 GB | Download |
| Faro-Yi-9B-DPO-IQ3_XS.gguf | GGUF | IQ3_XS | 3.46 GB | Download |
| Faro-Yi-9B-DPO-IQ3_XXS.gguf | GGUF | IQ3_XXS | 3.24 GB | Download |
| Faro-Yi-9B-DPO-IQ4_NL.gguf | GGUF | IQ4_NL | 4.70 GB | Download |
| Faro-Yi-9B-DPO-IQ4_XS.gguf | GGUF | IQ4_XS | 4.46 GB | Download |
| Faro-Yi-9B-DPO-Q2_K.gguf | GGUF | Q2_K | 3.12 GB | Download |
| Faro-Yi-9B-DPO-Q2_K_S.gguf | GGUF | Q2_K_S | 2.90 GB | Download |
| Faro-Yi-9B-DPO-Q3_K_L.gguf | GGUF | Q3_K_L | 4.37 GB | Download |
| Faro-Yi-9B-DPO-Q3_K_M.gguf | GGUF | Q3_K_M | 4.03 GB | Download |
| Faro-Yi-9B-DPO-Q3_K_S.gguf | GGUF | Q3_K_S | 3.63 GB | Download |
| Faro-Yi-9B-DPO-Q4_0.gguf | GGUF | — | 4.71 GB | Download |
| Faro-Yi-9B-DPO-Q4_1.gguf | GGUF | — | 5.19 GB | Download |
| Faro-Yi-9B-DPO-Q4_K_M.gguf | GGUF | Q4_K_M | 4.96 GB | Download |
| Faro-Yi-9B-DPO-Q4_K_S.gguf | GGUF | Q4_K_S | 4.72 GB | Download |
| Faro-Yi-9B-DPO-Q5_0.gguf | GGUF | — | 5.70 GB | Download |
| Faro-Yi-9B-DPO-Q5_1.gguf | GGUF | — | 6.19 GB | Download |
| Faro-Yi-9B-DPO-Q5_K_M.gguf | GGUF | Q5_K_M | 5.83 GB | Download |
| Faro-Yi-9B-DPO-Q5_K_S.gguf | GGUF | Q5_K_S | 5.69 GB | Download |
| Faro-Yi-9B-DPO-Q6_K.gguf | GGUF | Q6_K | 6.75 GB | Download |
| Faro-Yi-9B-DPO-Q8_0.gguf | GGUF | — | 8.74 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "other",
"language": [
"en"
],
"pipeline_tag": "text-generation",
"inference": false,
"tags": [
"transformers",
"gguf",
"imatrix",
"Faro-Yi-9B-DPO"
],
"frontmatter": {
"license": "other",
"language": [
"en"
],
"pipeline_tag": "text-generation",
"inference": "false",
"tags": [
"transformers",
"gguf",
"imatrix",
"Faro-Yi-9B-DPO"
]
},
"hero_image_url": "https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/ArlnloL4aPfiiD6kUqaSH.png",
"summary": "This is the DPO version of wenbopan/Faro-Yi-9B. Compared to Faro-Yi-9B and Yi-9B-200K, the DPO model excels at many tasks, surpassing the original Yi-9B-200K by a large margin. On the Open LLM Leaderboard, it ranks **#2** among all 9B models, **#1** among all Yi-9B variants. | **Metric** | **MMLU** | **GSM8K** | **hellaswag** | **truthfulqa** | **ai2_arc** | **winogrande** | **CMMLU** | | ----------------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- | --------- | | **Yi-9B-200K** | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 | 71.97 | | **Faro-Yi-9B** | 68.80 | 63.08 | 57.28 | 40.86 | 72.58 | 71.11 | 73.28 | | **Faro-Yi-9B-DPO** | **69.98** | **66.11** | **59.04** | **48.01** | **75.68** | **73.40** | **75.23** | Faro-Yi-9B-DPO's responses are also favored by GPT-4 Judge in MT-Bench !image/png",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- Faro-Yi-9B-DPO\n---\nQuantizations of https://huggingface.co/wenbopan/Faro-Yi-9B-DPO\n\n### Inference Clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [ollama](https://github.com/ollama/ollama)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [jan](https://github.com/janhq/jan)\n* [GPT4All](https://github.com/nomic-ai/gpt4all)\n---\n\n# From original readme\n\nThis is the DPO version of [wenbopan/Faro-Yi-9B](https://huggingface.co/wenbopan/Faro-Yi-9B). Compared to Faro-Yi-9B and [Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K), the DPO model excels at many tasks, surpassing the original Yi-9B-200K by a large margin. On the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), it ranks **#2** among all 9B models, **#1** among all Yi-9B variants.\n\n| **Metric** | **MMLU** | **GSM8K** | **hellaswag** | **truthfulqa** | **ai2_arc** | **winogrande** | **CMMLU** |\n| ----------------------- | --------- | --------- | ------------- | -------------- | ----------- | -------------- | --------- |\n| **Yi-9B-200K** | 65.73 | 50.49 | 56.72 | 33.80 | 69.25 | 71.67 | 71.97 |\n| **Faro-Yi-9B** | 68.80 | 63.08 | 57.28 | 40.86 | 72.58 | 71.11 | 73.28 |\n| **Faro-Yi-9B-DPO** | **69.98** | **66.11** | **59.04** | **48.01** | **75.68** | **73.40** | **75.23** |\n\nFaro-Yi-9B-DPO's responses are also favored by GPT-4 Judge in MT-Bench\n\n\n\n## How to Use\n\nFaro-Yi-9B-DPO uses the chatml template and performs well in both short and long contexts. For longer inputs under **24GB of VRAM**, I recommend to use vLLM to have a max prompt of 32K. Setting `kv_cache_dtype=\"fp8_e5m2\"` allows for 48K input length. 4bit-AWQ quantization on top of that can boost input length to 160K, albeit with some performance impact. Adjust `max_model_len` arg in vLLM or `config.json` to avoid OOM.\n\n\n```python\nimport io\nimport requests\nfrom PyPDF2 import PdfReader\nfrom vllm import LLM, SamplingParams\n\nllm = LLM(model=\"wenbopan/Faro-Yi-9B-DPO\", kv_cache_dtype=\"fp8_e5m2\", max_model_len=100000)\n\npdf_data = io.BytesIO(requests.get(\"https://arxiv.org/pdf/2303.08774.pdf\").content)\ndocument = \"\".join(page.extract_text() for page in PdfReader(pdf_data).pages) # 100 pages\n\nquestion = f\"{document}\\n\\nAccording to the paper, what is the parameter count of GPT-4?\"\nmessages = [ {\"role\": \"user\", \"content\": question} ] # 83K tokens\nprompt = llm.get_tokenizer().apply_chat_template(messages, add_generation_prompt=True, tokenize=False)\noutput = llm.generate(prompt, SamplingParams(temperature=0.8, max_tokens=500))\nprint(output[0].outputs[0].text)\n# Yi-9B-200K: 175B. GPT-4 has 175B \\nparameters. How many models were combined to create GPT-4? Answer: 6. ...\n# Faro-Yi-9B: GPT-4 does not have a publicly disclosed parameter count due to the competitive landscape and safety implications of large-scale models like GPT-4. ...\n```\n\n\n<details> <summary>Or With Transformers</summary>\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained('wenbopan/Faro-Yi-9B-DPO', device_map=\"cuda\")\ntokenizer = AutoTokenizer.from_pretrained('wenbopan/Faro-Yi-9B-DPO')\nmessages = [\n {\"role\": \"system\", \"content\": \"You are a helpful assistant. Always answer with a short response.\"},\n {\"role\": \"user\", \"content\": \"Tell me what is Pythagorean theorem like you are a pirate.\"}\n]\n\ninput_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\ngenerated_ids = model.generate(input_ids, max_new_tokens=512, temperature=0.5)\nresponse = tokenizer.decode(generated_ids[0], skip_special_tokens=True) # Aye, matey! The Pythagorean theorem is a nautical rule that helps us find the length of the third side of a triangle. ...\n```\n\n</details>\n",
"related_quantizations": []
},
"tags": [
"transformers",
"gguf",
"imatrix",
"Faro-Yi-9B-DPO",
"text-generation",
"en",
"arxiv:2303.08774",
"license:other",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 245,
"gated": false,
"private": false,
"last_modified": "2025-02-19T03:21:07.000Z",
"created_at": "2025-02-18T23:23:46.000Z",
"pipeline_tag": "text-generation",
"library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "67b51682d33342e9bca31bcc",
"id": "duyntnet/Faro-Yi-9B-DPO-imatrix-GGUF",
"modelId": "duyntnet/Faro-Yi-9B-DPO-imatrix-GGUF",
"sha": "e53f8a354976729ce45c00f8bce67fa1c1de6f97",
"createdAt": "2025-02-18T23:23:46.000Z",
"lastModified": "2025-02-19T03:21:07.000Z",
"author": "duyntnet",
"downloads": 245,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "transformers",
"siblings_count": 29
}