Model Intelligence Sheet

richarderkhov/ibm_-_powermoe-3b-gguf overview

model = AutoModelForCausalLM.frompretrained(modelpath, devicemap=device) model.eval() # change input text as desired prompt = "Write a code to find the maximum value in a list of numbers." # tokenize the text inputtokens = tokenizer(prompt, returntensors="pt") # transfer tokenized inputs to the device for i in inputtokens: inputtokens[i] = inputtokens[i].to(device) # generate output tokens output = model.generate(inputtokens, maxnewtokens=100) # decode output tokens into text output = tokenizer.batchdecode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.

ggufarxiv:2408.13359endpoints_compatibleregion:us

richarderkhov/ibm_-_powermoe-3b-gguf visual

Downloads

Likes

Pipeline

—

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

22 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
PowerMoE-3b.IQ3_M.gguf	GGUF	IQ3_M	1.41 GB	Download
PowerMoE-3b.IQ3_S.gguf	GGUF	IQ3_S	1.39 GB	Download
PowerMoE-3b.IQ3_XS.gguf	GGUF	IQ3_XS	1.32 GB	Download
PowerMoE-3b.IQ4_NL.gguf	GGUF	IQ4_NL	1.81 GB	Download
PowerMoE-3b.IQ4_XS.gguf	GGUF	IQ4_XS	1.72 GB	Download
PowerMoE-3b.Q2_K.gguf	GGUF	Q2_K	1.18 GB	Download
PowerMoE-3b.Q3_K.gguf	GGUF	Q3_K	1.53 GB	Download
PowerMoE-3b.Q3_K_L.gguf	GGUF	Q3_K_L	1.65 GB	Download
PowerMoE-3b.Q3_K_M.gguf	GGUF	Q3_K_M	1.53 GB	Download
PowerMoE-3b.Q3_K_S.gguf	GGUF	Q3_K_S	1.39 GB	Download
PowerMoE-3b.Q4_0.gguf	GGUF	—	1.79 GB	Download
PowerMoE-3b.Q4_1.gguf	GGUF	—	1.99 GB	Download
PowerMoE-3b.Q4_K.gguf	GGUF	Q4_K	1.92 GB	Download
PowerMoE-3b.Q4_K_M.gguf	GGUF	Q4_K_M	1.92 GB	Download
PowerMoE-3b.Q4_K_S.gguf	GGUF	Q4_K_S	1.81 GB	Download
PowerMoE-3b.Q5_0.gguf	GGUF	—	2.18 GB	Download
PowerMoE-3b.Q5_1.gguf	GGUF	—	2.37 GB	Download
PowerMoE-3b.Q5_K.gguf	GGUF	Q5_K	2.24 GB	Download
PowerMoE-3b.Q5_K_M.gguf	GGUF	Q5_K_M	2.24 GB	Download
PowerMoE-3b.Q5_K_S.gguf	GGUF	Q5_K_S	2.18 GB	Download
PowerMoE-3b.Q6_K.gguf	GGUF	Q6_K	2.59 GB	Download
PowerMoE-3b.Q8_0.gguf	GGUF	—	3.35 GB	Download

Model Details Live

Model Slug

richarderkhov/ibm_-_powermoe-3b-gguf

Author

RichardErkhov

Pipeline Task

—

Library

—

Created

2024-10-21

Last Modified

2024-10-21

Gated

Private

HF SHA

51ec1d031a4245a2fc087aea92702384604dd228

License

Unknown

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "frontmatter": {},
    "hero_image_url": "",
    "summary": "model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() # change input text as desired prompt = \"Write a code to find the maximum value in a list of numbers.\" # tokenize the text input_tokens = tokenizer(prompt, return_tensors=\"pt\") # transfer tokenized inputs to the device for i in input_tokens: input_tokens[i] = input_tokens[i].to(device) # generate output tokens output = model.generate(**input_tokens, max_new_tokens=100) # decode output tokens into text output = tokenizer.batch_decode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) ``` Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "Quantization made by Richard Erkhov.\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\n\nPowerMoE-3b - GGUF\n- Model creator: https://huggingface.co/ibm/\n- Original model: https://huggingface.co/ibm/PowerMoE-3b/\n\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [PowerMoE-3b.Q2_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q2_K.gguf) | Q2_K | 1.18GB |\n| [PowerMoE-3b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_XS.gguf) | IQ3_XS | 1.32GB |\n| [PowerMoE-3b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_S.gguf) | IQ3_S | 1.39GB |\n| [PowerMoE-3b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_S.gguf) | Q3_K_S | 1.39GB |\n| [PowerMoE-3b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_M.gguf) | IQ3_M | 1.41GB |\n| [PowerMoE-3b.Q3_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K.gguf) | Q3_K | 1.53GB |\n| [PowerMoE-3b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_M.gguf) | Q3_K_M | 1.53GB |\n| [PowerMoE-3b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_L.gguf) | Q3_K_L | 1.65GB |\n| [PowerMoE-3b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_XS.gguf) | IQ4_XS | 1.72GB |\n| [PowerMoE-3b.Q4_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_0.gguf) | Q4_0 | 1.79GB |\n| [PowerMoE-3b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_NL.gguf) | IQ4_NL | 1.81GB |\n| [PowerMoE-3b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_S.gguf) | Q4_K_S | 1.81GB |\n| [PowerMoE-3b.Q4_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K.gguf) | Q4_K | 1.92GB |\n| [PowerMoE-3b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_M.gguf) | Q4_K_M | 1.92GB |\n| [PowerMoE-3b.Q4_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_1.gguf) | Q4_1 | 1.99GB |\n| [PowerMoE-3b.Q5_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_0.gguf) | Q5_0 | 2.18GB |\n| [PowerMoE-3b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_S.gguf) | Q5_K_S | 2.18GB |\n| [PowerMoE-3b.Q5_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K.gguf) | Q5_K | 2.24GB |\n| [PowerMoE-3b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_M.gguf) | Q5_K_M | 2.24GB |\n| [PowerMoE-3b.Q5_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_1.gguf) | Q5_1 | 2.37GB |\n| [PowerMoE-3b.Q6_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q6_K.gguf) | Q6_K | 2.59GB |\n| [PowerMoE-3b.Q8_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q8_0.gguf) | Q8_0 | 3.35GB |\n\n\n\n\nOriginal model description:\n---\npipeline_tag: text-generation\ninference: false\nlicense: apache-2.0\nlibrary_name: transformers\nmodel-index:\n- name: ibm/PowerMoE-3b\n  results:\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: ARC\n    metrics:\n    - name: accuracy-norm\n      type: accuracy-norm\n      value: 58.1\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: BoolQ\n    metrics:\n    - name: accuracy\n      type: accuracy\n      value: 65.0\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: Hellaswag\n    metrics:\n    - name: accuracy-norm\n      type: accuracy-norm\n      value: 71.5\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: OpenBookQA\n    metrics:\n    - name: accuracy-norm\n      type: accuracy-norm\n      value: 41.0\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: PIQA\n    metrics:\n    - name: accuracy-norm\n      type: accuracy-norm\n      value: 79.1\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: Winogrande\n    metrics:\n    - name: accuracy-norm\n      type: accuracy-norm\n      value: 65.0\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: MMLU (5 shot)\n    metrics:\n    - name: accuracy\n      type: accuracy\n      value: 42.8\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: GSM8k (5 shot)\n    metrics:\n    - name: accuracy\n      type: accuracy\n      value: 25.9\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: lm-eval-harness\n      name: math (4 shot)\n    metrics:\n    - name: accuracy\n      type: accuracy\n      value: 14.8\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: bigcode-eval\n      name: humaneval\n    metrics:\n    - name: pass@1\n      type: pass@1\n      value: 20.1\n      verified: false\n  - task:\n      type: text-generation\n    dataset:\n      type: bigcode-eval\n      name: MBPP\n    metrics:\n    - name: pass@1\n      type: pass@1\n      value: 32.4\n      verified: false\n---\n\n## Model Summary\nPowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.\nPaper: https://arxiv.org/abs/2408.13359\n\n## Usage\nNote: Requires installing HF transformers from source.\n\n### Generation\nThis is a simple example of how to use **PowerMoE-3b** model.\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\ndevice = \"cuda\" # or \"cpu\"\nmodel_path = \"ibm/PowerMoE-3b\"\ntokenizer = AutoTokenizer.from_pretrained(model_path)\n# drop device_map if running on CPU\nmodel = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)\nmodel.eval()\n# change input text as desired\nprompt = \"Write a code to find the maximum value in a list of numbers.\"\n# tokenize the text\ninput_tokens = tokenizer(prompt, return_tensors=\"pt\")\n# transfer tokenized inputs to the device\nfor i in input_tokens:\n    input_tokens[i] = input_tokens[i].to(device)\n# generate output tokens\noutput = model.generate(**input_tokens, max_new_tokens=100)\n# decode output tokens into text\noutput = tokenizer.batch_decode(output)\n# loop over the batch to print, in this example the batch size is 1\nfor i in output:\n    print(i)\n```\n\n\nAdditional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "arxiv:2408.13359",
    "endpoints_compatible",
    "region:us"
  ],
  "likes": 0,
  "downloads": 95,
  "gated": false,
  "private": false,
  "last_modified": "2024-10-21T04:23:17.000Z",
  "created_at": "2024-10-21T03:49:41.000Z",
  "pipeline_tag": "",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6715cf557ebc9ce65c1153e8",
  "id": "RichardErkhov/ibm_-_PowerMoE-3b-gguf",
  "modelId": "RichardErkhov/ibm_-_PowerMoE-3b-gguf",
  "sha": "51ec1d031a4245a2fc087aea92702384604dd228",
  "createdAt": "2024-10-21T03:49:41.000Z",
  "lastModified": "2024-10-21T04:23:17.000Z",
  "author": "RichardErkhov",
  "downloads": 95,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "",
  "siblings_count": 24
}