duyntnet/mistral-nemo-minitron-8b-instruct-imatrix-gguf Q4_K_S GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

duyntnet/mistral-nemo-minitron-8b-instruct-imatrix-gguf overview

Mistral-NeMo-Minitron-8B-Instruct is a model for generating responses for various text-generation tasks including roleplaying, retrieval augmented generation, and function calling. It is a fine-tuned version of nvidia/Mistral-NeMo-Minitron-8B-Base, which was pruned and distilled from Mistral-NeMo 12B using our LLM compression technique. The model was trained using a multi-stage SFT and preference-based alignment technique with NeMo Aligner. For details on the alignment technique, please refer to the Nemotron-4 340B Technical Report. The model supports a context length of 8,192 tokens. Try this model on build.nvidia.com. Model Developer: NVIDIA Model Dates: Mistral-NeMo-Minitron-8B-Instruct was trained between August 2024 and September 2024.

transformersggufimatrixMistral-NeMo-Minitron-8B-Instructtext-generationenarxiv:2407.14679arxiv:2406.11704license:otherregion:usconversational

duyntnet/mistral-nemo-minitron-8b-instruct-imatrix-gguf visual

Downloads

936

Likes

Pipeline

text-generation

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

27 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Mistral-NeMo-Minitron-8B-Instruct-IQ1_M.gguf	GGUF	IQ1_M	2.11 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ1_S.gguf	GGUF	IQ1_S	1.98 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ2_M.gguf	GGUF	IQ2_M	2.89 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ2_S.gguf	GGUF	IQ2_S	2.70 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ2_XS.gguf	GGUF	IQ2_XS	2.54 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ2_XXS.gguf	GGUF	IQ2_XXS	2.34 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ3_M.gguf	GGUF	IQ3_M	3.70 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ3_S.gguf	GGUF	IQ3_S	3.59 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ3_XS.gguf	GGUF	IQ3_XS	3.43 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ3_XXS.gguf	GGUF	IQ3_XXS	3.19 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ4_NL.gguf	GGUF	IQ4_NL	4.56 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-IQ4_XS.gguf	GGUF	IQ4_XS	4.34 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q2_K.gguf	GGUF	Q2_K	3.10 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q2_K_S.gguf	GGUF	Q2_K_S	2.91 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q3_K_L.gguf	GGUF	Q3_K_L	4.23 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q3_K_M.gguf	GGUF	Q3_K_M	3.92 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q3_K_S.gguf	GGUF	Q3_K_S	3.57 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q4_0.gguf	GGUF	—	4.56 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q4_1.gguf	GGUF	—	5.00 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q4_K_M.gguf	GGUF	Q4_K_M	4.79 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q4_K_S.gguf	GGUF	Q4_K_S	4.57 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q5_0.gguf	GGUF	—	5.48 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q5_1.gguf	GGUF	—	5.92 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q5_K_M.gguf	GGUF	Q5_K_M	5.59 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q5_K_S.gguf	GGUF	Q5_K_S	5.46 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q6_K.gguf	GGUF	Q6_K	6.44 GB	Download
Mistral-NeMo-Minitron-8B-Instruct-Q8_0.gguf	GGUF	—	8.33 GB	Download

Model Details Live

Model Slug

duyntnet/mistral-nemo-minitron-8b-instruct-imatrix-gguf

Author

duyntnet

Pipeline Task

text-generation

Library

transformers

Created

2024-12-11

Last Modified

2024-12-12

Gated

Private

HF SHA

d0939eb560b021c912a2ae5ed996032bbfce3fa1

License

other

Language

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "other",
    "language": [
      "en"
    ],
    "pipeline_tag": "text-generation",
    "inference": false,
    "tags": [
      "transformers",
      "gguf",
      "imatrix",
      "Mistral-NeMo-Minitron-8B-Instruct"
    ],
    "frontmatter": {
      "license": "other",
      "language": [
        "en"
      ],
      "pipeline_tag": "text-generation",
      "inference": "false",
      "tags": [
        "transformers",
        "gguf",
        "imatrix",
        "Mistral-NeMo-Minitron-8B-Instruct"
      ]
    },
    "hero_image_url": "",
    "summary": "Mistral-NeMo-Minitron-8B-Instruct is a model for generating responses for various text-generation tasks including roleplaying, retrieval augmented generation, and function calling. It is a fine-tuned version of nvidia/Mistral-NeMo-Minitron-8B-Base, which was pruned and distilled from Mistral-NeMo 12B using our LLM compression technique. The model was trained using a multi-stage SFT and preference-based alignment technique with NeMo Aligner. For details on the alignment technique, please refer to the Nemotron-4 340B Technical Report. The model supports a context length of 8,192 tokens. Try this model on build.nvidia.com. **Model Developer:** NVIDIA **Model Dates:** Mistral-NeMo-Minitron-8B-Instruct was trained between August 2024 and September 2024.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: other\nlanguage:\n- en\npipeline_tag: text-generation\ninference: false\ntags:\n- transformers\n- gguf\n- imatrix\n- Mistral-NeMo-Minitron-8B-Instruct\n---\nQuantizations of https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Instruct\n\n### Inference Clients/UIs\n* [llama.cpp](https://github.com/ggerganov/llama.cpp)\n* [KoboldCPP](https://github.com/LostRuins/koboldcpp)\n* [ollama](https://github.com/ollama/ollama)\n* [jan](https://github.com/janhq/jan)\n* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)\n* [GPT4All](https://github.com/nomic-ai/gpt4all)\n---\n\n# From original readme\n\nMistral-NeMo-Minitron-8B-Instruct is a model for generating responses for various text-generation tasks including roleplaying, retrieval augmented generation, and function calling. It is a fine-tuned version of [nvidia/Mistral-NeMo-Minitron-8B-Base](https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base), which was pruned and distilled from [Mistral-NeMo 12B](https://huggingface.co/nvidia/Mistral-NeMo-12B-Base) using [our LLM compression technique](https://arxiv.org/abs/2407.14679). The model was trained using a multi-stage SFT and preference-based alignment technique with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner). For details on the alignment technique, please refer to the [Nemotron-4 340B Technical Report](https://arxiv.org/abs/2406.11704). The model supports a context length of 8,192 tokens.\n\nTry this model on [build.nvidia.com](https://build.nvidia.com/nvidia/mistral-nemo-minitron-8b-8k-instruct).\n\n\n**Model Developer:** NVIDIA \n\n**Model Dates:** Mistral-NeMo-Minitron-8B-Instruct was trained between August 2024 and September 2024.\n\n## License\n\n[NVIDIA Open Model License](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)\n\n## Model Architecture\n\nMistral-NeMo-Minitron-8B-Instruct uses a model embedding size of 4096, 32 attention heads, MLP intermediate dimension of 11520, with 40 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).\n\n**Architecture Type:** Transformer Decoder (Auto-regressive Language Model) \n\n**Network Architecture:** Mistral-NeMo \n\n\n## Prompt Format:\n\nWe recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.\n\n```\n<extra_id_0>System\n{system prompt}\n\n<extra_id_1>User\n{prompt}\n<extra_id_1>Assistant\\n\n```\n\n- Note that a newline character `\\n` should be added at the end of the prompt.\n- We recommend using `<extra_id_1>` as a stop token.\n\n\n## Usage\n\n```\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\n# Load the tokenizer and model\ntokenizer  = AutoTokenizer.from_pretrained(\"nvidia/Mistral-NeMo-Minitron-8B-Instruct\")\nmodel = AutoModelForCausalLM.from_pretrained(\"nvidia/Mistral-NeMo-Minitron-8B-Instruct\")\n\n# Use the prompt template\nmessages = [\n    {\n        \"role\": \"system\",\n        \"content\": \"You are a friendly chatbot who always responds in the style of a pirate\",\n    },\n    {\"role\": \"user\", \"content\": \"How many helicopters can a human eat in one sitting?\"},\n ]\ntokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors=\"pt\")\n\noutputs = model.generate(tokenized_chat, stop_strings=[\"<extra_id_1>\"], tokenizer=tokenizer)\nprint(tokenizer.decode(outputs[0]))\n```\n\nYou can also use `pipeline` but you need to create a tokenizer object and assign it to the pipeline manually.\n\n```\nfrom transformers import AutoTokenizer\nfrom transformers import pipeline\n\ntokenizer  = AutoTokenizer.from_pretrained(\"nvidia/Mistral-NeMo-Minitron-8B-Instruct\")\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Who are you?\"},\n]\npipe = pipeline(\"text-generation\", model=\"nvidia/Mistral-NeMo-Minitron-8B-Instruct\")\npipe(messages, max_new_tokens=64, stop_strings=[\"<extra_id_1>\"], tokenizer=tokenizer)\n```",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "imatrix",
    "Mistral-NeMo-Minitron-8B-Instruct",
    "text-generation",
    "en",
    "arxiv:2407.14679",
    "arxiv:2406.11704",
    "license:other",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 936,
  "gated": false,
  "private": false,
  "last_modified": "2024-12-12T22:23:04.000Z",
  "created_at": "2024-12-11T00:40:54.000Z",
  "pipeline_tag": "text-generation",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6758df9609c9800fab5f824d",
  "id": "duyntnet/Mistral-NeMo-Minitron-8B-Instruct-imatrix-GGUF",
  "modelId": "duyntnet/Mistral-NeMo-Minitron-8B-Instruct-imatrix-GGUF",
  "sha": "d0939eb560b021c912a2ae5ed996032bbfce3fa1",
  "createdAt": "2024-12-11T00:40:54.000Z",
  "lastModified": "2024-12-12T22:23:04.000Z",
  "author": "duyntnet",
  "downloads": 936,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "siblings_count": 29
}

duyntnet/mistral-nemo-minitron-8b-instruct-imatrix-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard