richarderkhov/ibm_-_powermoe-3b-gguf overview
model = AutoModelForCausalLM.frompretrained(modelpath, devicemap=device) model.eval() # change input text as desired prompt = "Write a code to find the maximum value in a list of numbers." # tokenize the text inputtokens = tokenizer(prompt, returntensors="pt") # transfer tokenized inputs to the device for i in inputtokens: inputtokens[i] = inputtokens[i].to(device) # generate output tokens output = model.generate(inputtokens, maxnewtokens=100) # decode output tokens into text output = tokenizer.batchdecode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| PowerMoE-3b.IQ3_M.gguf | GGUF | IQ3_M | 1.41 GB | Download |
| PowerMoE-3b.IQ3_S.gguf | GGUF | IQ3_S | 1.39 GB | Download |
| PowerMoE-3b.IQ3_XS.gguf | GGUF | IQ3_XS | 1.32 GB | Download |
| PowerMoE-3b.IQ4_NL.gguf | GGUF | IQ4_NL | 1.81 GB | Download |
| PowerMoE-3b.IQ4_XS.gguf | GGUF | IQ4_XS | 1.72 GB | Download |
| PowerMoE-3b.Q2_K.gguf | GGUF | Q2_K | 1.18 GB | Download |
| PowerMoE-3b.Q3_K.gguf | GGUF | Q3_K | 1.53 GB | Download |
| PowerMoE-3b.Q3_K_L.gguf | GGUF | Q3_K_L | 1.65 GB | Download |
| PowerMoE-3b.Q3_K_M.gguf | GGUF | Q3_K_M | 1.53 GB | Download |
| PowerMoE-3b.Q3_K_S.gguf | GGUF | Q3_K_S | 1.39 GB | Download |
| PowerMoE-3b.Q4_0.gguf | GGUF | — | 1.79 GB | Download |
| PowerMoE-3b.Q4_1.gguf | GGUF | — | 1.99 GB | Download |
| PowerMoE-3b.Q4_K.gguf | GGUF | Q4_K | 1.92 GB | Download |
| PowerMoE-3b.Q4_K_M.gguf | GGUF | Q4_K_M | 1.92 GB | Download |
| PowerMoE-3b.Q4_K_S.gguf | GGUF | Q4_K_S | 1.81 GB | Download |
| PowerMoE-3b.Q5_0.gguf | GGUF | — | 2.18 GB | Download |
| PowerMoE-3b.Q5_1.gguf | GGUF | — | 2.37 GB | Download |
| PowerMoE-3b.Q5_K.gguf | GGUF | Q5_K | 2.24 GB | Download |
| PowerMoE-3b.Q5_K_M.gguf | GGUF | Q5_K_M | 2.24 GB | Download |
| PowerMoE-3b.Q5_K_S.gguf | GGUF | Q5_K_S | 2.18 GB | Download |
| PowerMoE-3b.Q6_K.gguf | GGUF | Q6_K | 2.59 GB | Download |
| PowerMoE-3b.Q8_0.gguf | GGUF | — | 3.35 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"frontmatter": {},
"hero_image_url": "",
"summary": "model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() # change input text as desired prompt = \"Write a code to find the maximum value in a list of numbers.\" # tokenize the text input_tokens = tokenizer(prompt, return_tensors=\"pt\") # transfer tokenized inputs to the device for i in input_tokens: input_tokens[i] = input_tokens[i].to(device) # generate output tokens output = model.generate(**input_tokens, max_new_tokens=100) # decode output tokens into text output = tokenizer.batch_decode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) ``` Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nPowerMoE-3b - GGUF\n- Model creator: https://huggingface.co/ibm/\n- Original model: https://huggingface.co/ibm/PowerMoE-3b/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [PowerMoE-3b.Q2_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q2_K.gguf) | Q2_K | 1.18GB |\n| [PowerMoE-3b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_XS.gguf) | IQ3_XS | 1.32GB |\n| [PowerMoE-3b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_S.gguf) | IQ3_S | 1.39GB |\n| [PowerMoE-3b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_S.gguf) | Q3_K_S | 1.39GB |\n| [PowerMoE-3b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_M.gguf) | IQ3_M | 1.41GB |\n| [PowerMoE-3b.Q3_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K.gguf) | Q3_K | 1.53GB |\n| [PowerMoE-3b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_M.gguf) | Q3_K_M | 1.53GB |\n| [PowerMoE-3b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_L.gguf) | Q3_K_L | 1.65GB |\n| [PowerMoE-3b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_XS.gguf) | IQ4_XS | 1.72GB |\n| [PowerMoE-3b.Q4_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_0.gguf) | Q4_0 | 1.79GB |\n| [PowerMoE-3b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_NL.gguf) | IQ4_NL | 1.81GB |\n| [PowerMoE-3b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_S.gguf) | Q4_K_S | 1.81GB |\n| [PowerMoE-3b.Q4_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K.gguf) | Q4_K | 1.92GB |\n| [PowerMoE-3b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_M.gguf) | Q4_K_M | 1.92GB |\n| [PowerMoE-3b.Q4_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_1.gguf) | Q4_1 | 1.99GB |\n| [PowerMoE-3b.Q5_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_0.gguf) | Q5_0 | 2.18GB |\n| [PowerMoE-3b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_S.gguf) | Q5_K_S | 2.18GB |\n| [PowerMoE-3b.Q5_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K.gguf) | Q5_K | 2.24GB |\n| [PowerMoE-3b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_M.gguf) | Q5_K_M | 2.24GB |\n| [PowerMoE-3b.Q5_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_1.gguf) | Q5_1 | 2.37GB |\n| [PowerMoE-3b.Q6_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q6_K.gguf) | Q6_K | 2.59GB |\n| [PowerMoE-3b.Q8_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q8_0.gguf) | Q8_0 | 3.35GB |\n\n\n\n\nOriginal model description:\n---\npipeline_tag: text-generation\ninference: false\nlicense: apache-2.0\nlibrary_name: transformers\nmodel-index:\n- name: ibm/PowerMoE-3b\n results:\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: ARC\n metrics:\n - name: accuracy-norm\n type: accuracy-norm\n value: 58.1\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: BoolQ\n metrics:\n - name: accuracy\n type: accuracy\n value: 65.0\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: Hellaswag\n metrics:\n - name: accuracy-norm\n type: accuracy-norm\n value: 71.5\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: OpenBookQA\n metrics:\n - name: accuracy-norm\n type: accuracy-norm\n value: 41.0\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: PIQA\n metrics:\n - name: accuracy-norm\n type: accuracy-norm\n value: 79.1\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: Winogrande\n metrics:\n - name: accuracy-norm\n type: accuracy-norm\n value: 65.0\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: MMLU (5 shot)\n metrics:\n - name: accuracy\n type: accuracy\n value: 42.8\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: GSM8k (5 shot)\n metrics:\n - name: accuracy\n type: accuracy\n value: 25.9\n verified: false\n - task:\n type: text-generation\n dataset:\n type: lm-eval-harness\n name: math (4 shot)\n metrics:\n - name: accuracy\n type: accuracy\n value: 14.8\n verified: false\n - task:\n type: text-generation\n dataset:\n type: bigcode-eval\n name: humaneval\n metrics:\n - name: pass@1\n type: pass@1\n value: 20.1\n verified: false\n - task:\n type: text-generation\n dataset:\n type: bigcode-eval\n name: MBPP\n metrics:\n - name: pass@1\n type: pass@1\n value: 32.4\n verified: false\n---\n\n## Model Summary\nPowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.\nPaper: https://arxiv.org/abs/2408.13359\n\n## Usage\nNote: Requires installing HF transformers from source.\n\n### Generation\nThis is a simple example of how to use **PowerMoE-3b** model.\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\ndevice = \"cuda\" # or \"cpu\"\nmodel_path = \"ibm/PowerMoE-3b\"\ntokenizer = AutoTokenizer.from_pretrained(model_path)\n# drop device_map if running on CPU\nmodel = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)\nmodel.eval()\n# change input text as desired\nprompt = \"Write a code to find the maximum value in a list of numbers.\"\n# tokenize the text\ninput_tokens = tokenizer(prompt, return_tensors=\"pt\")\n# transfer tokenized inputs to the device\nfor i in input_tokens:\n input_tokens[i] = input_tokens[i].to(device)\n# generate output tokens\noutput = model.generate(**input_tokens, max_new_tokens=100)\n# decode output tokens into text\noutput = tokenizer.batch_decode(output)\n# loop over the batch to print, in this example the batch size is 1\nfor i in output:\n print(i)\n```\n\n\nAdditional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.",
"related_quantizations": []
},
"tags": [
"gguf",
"arxiv:2408.13359",
"endpoints_compatible",
"region:us"
],
"likes": 0,
"downloads": 95,
"gated": false,
"private": false,
"last_modified": "2024-10-21T04:23:17.000Z",
"created_at": "2024-10-21T03:49:41.000Z",
"pipeline_tag": "",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "6715cf557ebc9ce65c1153e8",
"id": "RichardErkhov/ibm_-_PowerMoE-3b-gguf",
"modelId": "RichardErkhov/ibm_-_PowerMoE-3b-gguf",
"sha": "51ec1d031a4245a2fc087aea92702384604dd228",
"createdAt": "2024-10-21T03:49:41.000Z",
"lastModified": "2024-10-21T04:23:17.000Z",
"author": "RichardErkhov",
"downloads": 95,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "",
"siblings_count": 24
}