Model Intelligence Sheet

unsloth/ministral-3-14b-reasoning-2512-gguf overview

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.

vllmggufmistral-commonmistralunslothenfresdeitptnlzhjakoarbase_model:mistralai/Ministral-3-14B-Reasoning-2512base_model:quantized:mistralai/Ministral-3-14B-Reasoning-2512license:apache-2.0region:usconversational

unsloth/ministral-3-14b-reasoning-2512-gguf visual

Downloads

20,972

Likes

Pipeline

—

Library

vllm

Visibility

Public

Access

Open

Repository Files & Downloads

29 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Ministral-3-14B-Reasoning-2512-BF16.gguf	GGUF	BF16	25.17 GB	Download
Ministral-3-14B-Reasoning-2512-IQ4_NL.gguf	GGUF	IQ4_NL	7.27 GB	Download
Ministral-3-14B-Reasoning-2512-IQ4_XS.gguf	GGUF	IQ4_XS	6.92 GB	Download
Ministral-3-14B-Reasoning-2512-Q2_K.gguf	GGUF	Q2_K	4.89 GB	Download
Ministral-3-14B-Reasoning-2512-Q2_K_L.gguf	GGUF	Q2_K_L	5.03 GB	Download
Ministral-3-14B-Reasoning-2512-Q3_K_M.gguf	GGUF	Q3_K_M	6.22 GB	Download
Ministral-3-14B-Reasoning-2512-Q3_K_S.gguf	GGUF	Q3_K_S	5.66 GB	Download
Ministral-3-14B-Reasoning-2512-Q4_0.gguf	GGUF	—	7.27 GB	Download
Ministral-3-14B-Reasoning-2512-Q4_1.gguf	GGUF	—	7.99 GB	Download
Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf	GGUF	Q4_K_M	7.67 GB	Download
Ministral-3-14B-Reasoning-2512-Q4_K_S.gguf	GGUF	Q4_K_S	7.30 GB	Download
Ministral-3-14B-Reasoning-2512-Q5_K_M.gguf	GGUF	Q5_K_M	8.96 GB	Download
Ministral-3-14B-Reasoning-2512-Q5_K_S.gguf	GGUF	Q5_K_S	8.74 GB	Download
Ministral-3-14B-Reasoning-2512-Q6_K.gguf	GGUF	Q6_K	10.33 GB	Download
Ministral-3-14B-Reasoning-2512-Q8_0.gguf	GGUF	—	13.37 GB	Download
Ministral-3-14B-Reasoning-2512-UD-IQ1_M.gguf	GGUF	IQ1_M	3.42 GB	Download
Ministral-3-14B-Reasoning-2512-UD-IQ1_S.gguf	GGUF	IQ1_S	3.21 GB	Download
Ministral-3-14B-Reasoning-2512-UD-IQ2_M.gguf	GGUF	IQ2_M	4.57 GB	Download
Ministral-3-14B-Reasoning-2512-UD-IQ2_XXS.gguf	GGUF	IQ2_XXS	3.78 GB	Download
Ministral-3-14B-Reasoning-2512-UD-IQ3_XXS.gguf	GGUF	IQ3_XXS	5.12 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q2_K_XL.gguf	GGUF	Q2_K_XL	5.15 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q3_K_XL.gguf	GGUF	Q3_K_XL	6.46 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q4_K_XL.gguf	GGUF	Q4_K_XL	7.79 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q5_K_XL.gguf	GGUF	Q5_K_XL	8.98 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q6_K_XL.gguf	GGUF	Q6_K_XL	11.29 GB	Download
Ministral-3-14B-Reasoning-2512-UD-Q8_K_XL.gguf	GGUF	Q8_K_XL	15.94 GB	Download
mmproj-BF16.gguf	GGUF	BF16	838.53 MB	Download
mmproj-F16.gguf	GGUF	F16	837.38 MB	Download
mmproj-F32.gguf	GGUF	F32	1.64 GB	Download

Model Details Live

Model Slug

unsloth/ministral-3-14b-reasoning-2512-gguf

Author

unsloth

Pipeline Task

—

Library

vllm

Created

2025-12-02

Last Modified

2025-12-04

Gated

Private

HF SHA

2d687ca0523d4d87469835433b53642dfae83152

License

apache-2.0

Language

en, fr, es, de, it, pt, nl, zh, ja, ko, ar

Base Model

mistralai/Ministral-3-14B-Reasoning-2512

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "library_name": "vllm",
    "language": [
      "en",
      "fr",
      "es",
      "de",
      "it",
      "pt",
      "nl",
      "zh",
      "ja",
      "ko",
      "ar"
    ],
    "license": "apache-2.0",
    "inference": false,
    "base_model": [
      "mistralai/Ministral-3-14B-Reasoning-2512"
    ],
    "extra_gated_description": "If you want to learn more about how we process your personal data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.",
    "tags": [
      "mistral-common",
      "mistral",
      "unsloth"
    ],
    "frontmatter": {
      "library_name": "vllm",
      "language": [
        "en",
        "fr",
        "es",
        "de",
        "it",
        "pt",
        "nl",
        "zh",
        "ja",
        "ko",
        "ar"
      ],
      "license": "apache-2.0",
      "inference": "false",
      "base_model": [
        "mistralai/Ministral-3-14B-Reasoning-2512"
      ],
      "extra_gated_description": ">-",
      "tags": [
        "mistral-common",
        "mistral",
        "unsloth"
      ]
    },
    "hero_image_url": "https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png",
    "summary": "The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlibrary_name: vllm\nlanguage:\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- zh\n- ja\n- ko\n- ar\nlicense: apache-2.0\ninference: false\nbase_model:\n- mistralai/Ministral-3-14B-Reasoning-2512\nextra_gated_description: >-\n  If you want to learn more about how we process your personal data, please read\n  our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.\ntags:\n- mistral-common\n- mistral\n- unsloth\n---\n<div>\n  <p style=\"margin-bottom: 0; margin-top: 0;\">\n    <strong>See our <a href=\"https://huggingface.co/collections/unsloth/ministral-3\">Ministral 3 collection</a> for all versions including GGUF, 4-bit & FP8 formats.</strong>\n  </p>\n  <p style=\"margin-bottom: 0;\">\n    <em>Learn to run Ministral correctly - <a href=\"https://docs.unsloth.ai/new/ministral-3\">Read our Guide</a>.</em>\n  </p>\n<p style=\"margin-top: 0;margin-bottom: 0;\">\n   <em>See <a href=\"https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf\">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>\n  </p>\n  <div style=\"display: flex; gap: 5px; align-items: center; \">\n    <a href=\"https://github.com/unslothai/unsloth/\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"133\">\n    </a>\n    <a href=\"https://discord.gg/unsloth\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png\" width=\"173\">\n    </a>\n    <a href=\"https://docs.unsloth.ai/new/ministral-3\">\n      <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png\" width=\"143\">\n    </a>\n  </div>\n<h1 style=\"margin-top: 0rem;\">✨ Read our Ministral 3 Guide <a href=\"https://docs.unsloth.ai/new/ministral-3\">here</a>!</h1>\n</div>\n\n- Fine-tune Ministral 3 for free using our [Google Colab notebook](https://docs.unsloth.ai/new/ministral-3#fine-tuning)\n- Or train Ministral 3 with reinforcement learning (GSPO) with our [free notebook](https://docs.unsloth.ai/new/ministral-3#reinforcement-learning-grpo).\n- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n---\n\n# Ministral 3 14B Reasoning 2512\n\nThe largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.\n\nThis model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.\n\nThe Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.\n\n## Key Features\nMinistral 3 14B consists of two main architectural components:\n- **13.5B Language Model**\n- **0.4B Vision Encoder**\n\nThe Ministral 3 14B Reasoning model offers the following capabilities:\n- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.\n- **System Prompt**: Maintains strong adherence and support for system prompts.\n- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.\n- **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.\n- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.\n- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.\n- **Large Context Window**: Supports a 256k context window.\n\n### Use Cases\nPrivate AI deployments where advanced capabilities meet practical hardware constraints:\n- Private/custom chat and AI assistant deployments in constrained environments\n- Advanced local agentic use cases\n- Fine-tuning and specialization\n- And more...\n  \nBringing advanced AI capabilities to most environments.\n\n## Ministral 3 Family\n\n| Model Name                     | Type               | Precision | Link                                                                                     |\n|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|\n| Ministral 3 3B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512)                |\n| Ministral 3 3B Instruct 2512   | Instruct post-trained | FP8   | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)            |\n| Ministral 3 3B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512)           |\n| Ministral 3 8B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)                |\n| Ministral 3 8B Instruct 2512   | Instruct post-trained | FP8    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)            |\n| Ministral 3 8B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)           |\n| Ministral 3 14B Base 2512      | Base pre-trained**   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512)               |\n| Ministral 3 14B Instruct 2512  | Instruct post-trained | FP8    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)           |\n| **Ministral 3 14B Reasoning 2512** | **Reasoning capable**  | **BF16**      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512)          |\n\nOther formats available [here](https://huggingface.co/collections/mistralai/ministral-3-more).\n\n## Benchmark Results\n\nWe compare Ministral 3 to similar sized models.\n\n### Reasoning\n\n| Model                     | AIME25      | AIME24      | GPQA Diamond | LiveCodeBench |\n|---------------------------|-------------|-------------|--------------|---------------|\n| **Ministral 3 14B**       | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u>  |\n| Qwen3-14B (Thinking)      | 0.737       | 0.837       | 0.663        | 0.593         |\n|                           |             |             |              |               |\n| **Ministral 3 8B**        | 0.787       | <u>0.860</u>| 0.668        | <u>0.616</u>  |\n| Qwen3-VL-8B-Thinking      | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580         |\n|                           |             |             |              |               |\n| **Ministral 3 3B**        | <u>0.721</u>| <u>0.775</u>| 0.534        | <u>0.548</u>  |\n| Qwen3-VL-4B-Thinking      | 0.697       | 0.729       | <u>0.601</u> | 0.513         |\n\n### Instruct\n\n| Model                     | Arena Hard  | WildBench  | MATH Maj@1  | MM MTBench       |\n|---------------------------|-------------|------------|-------------|------------------|\n| **Ministral 3 14B**       | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u>      |\n| Qwen3 14B (Non-Thinking)  | 0.427       | 65.1       | 0.870       | NOT MULTIMODAL   |\n| Gemma3-12B-Instruct       | 0.436       | 63.2       | 0.854       | 6.70             |\n|                           |             |            |             |                  |\n| **Ministral 3 8B**        | 0.509       | <u>66.8</u>| 0.876       | <u>8.08</u>      |\n| Qwen3-VL-8B-Instruct      | <u>0.528</u>| 66.3       | <u>0.946</u>| 8.00             |\n|                           |             |            |             |                  |\n| **Ministral 3 3B**        | 0.305       | <u>56.8</u>| 0.830       | 7.83             |\n| Qwen3-VL-4B-Instruct      | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u>      |\n| Qwen3-VL-2B-Instruct      | 0.163       | 42.2       | 0.786       | 6.36             |\n| Gemma3-4B-Instruct        | 0.318       | 49.1       | 0.759       | 5.23             |\n\n### Base\n\n| Model               | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |\n|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|\n| **Ministral 3 14B** | 0.742             | <u>0.676</u>    | 0.648          | 0.820             | 0.794       | 0.749           |\n| Qwen3 14B Base      | <u>0.754</u>      | 0.620           | <u>0.661</u>   | <u>0.837</u>      | <u>0.804</u>| 0.703           |\n| Gemma 3 12B Base    | 0.690             | 0.487           | 0.587          | 0.766             | 0.745       | <u>0.788</u>    |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 8B**  | <u>0.706</u>      | <u>0.626</u>    | 0.591          | 0.793             | <u>0.761</u>| <u>0.681</u>    |\n| Qwen 3 8B Base      | 0.700             | 0.576           | <u>0.596</u>   | <u>0.794</u>      | 0.760       | 0.639           |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 3B**  | 0.652             | <u>0.601</u>    | 0.511          | 0.735             | 0.707       | 0.592           |\n| Qwen 3 4B Base      | <u>0.677</u>      | 0.405           | <u>0.570</u>   | <u>0.759</u>      | <u>0.713</u>| 0.530           |\n| Gemma 3 4B Base     | 0.516             | 0.294           | 0.430          | 0.626             | 0.589       | <u>0.640</u>    |\n\n## Usage\n\nThe model can be used with the following frameworks;\n- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)\n- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)\n  \n### vLLM\n\nWe recommend using this model with [vLLM](https://github.com/vllm-project/vllm).\n\n#### Installation\n\nMake sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):\n\n```\npip install vllm --upgrade\n```\n\nDoing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).\n\nTo check:\n```\npython -c \"import mistral_common; print(mistral_common.__version__)\"\n```\n\nYou can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).\n\n#### Serve\n\nTo fully exploit the `Ministral-3-14B-Reasoning-2512` we recommed using 2xH200 GPUs for deployment due to its large context. However if you don't need a large context, you can fall back to a single GPU.\n\nA simple launch command is:\n\n```bash\n\nvllm serve mistralai/Ministral-3-14B-Reasoning-2512-FP8 \\\n  --tensor-parallel-size 2 \\\n  --enable-auto-tool-choice --tool-call-parser mistral \\\n  --reasoning-parser mistral\n```\n\nKey parameter notes:\n\n* enable-auto-tool-choice: Required when enabling tool usage.\n* tool-call-parser mistral: Required when enabling tool usage.\n* reasoning-parser mistral: Required when enabling reasoning.\n\nAdditional flags:\n\n* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.\n* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.\n\n#### Usage of the model\n\nHere we asumme that the model `mistralai/Ministral-3-8B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.\n\n<details>\n  <summary>Vision Reasoning</summary>\n\nLet's see if the Ministral 3 model knows when to pick a fight !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    elif content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\nNow we'll make it compute some maths !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\n</details>\n\n<details>\n  <summary>Text-Only Request</summary>\n\nLet's do more maths and leave it up to the model to figure out how to achieve a result.\n\n```python\nfrom typing import Any\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nquery = \"Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24.\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\"role\": \"user\", \"content\": query}\n]\nstream = client.chat.completions.create(\n  model=model,\n  messages=messages,\n  stream=True,\n  temperature=TEMP,\n  top_p=TOP_P,\n  max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\"No answer was generated by the model, probably because the maximum number of tokens was reached.\")\n```\n\n</details>\n\n### Transformers\n\nYou can also use Ministral 3 3B Reasoning 2512 with `Transformers` !\nMake sure to install `Transformers` from its first v5 release candidate or from \"main\":\n\n```\npip install transformers==5.0.0rc0\n```\n\nTo make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.\n\n```bash\npip install mistral-common --upgrade\n```\n\nThen load our tokenizer along with the model and generate:\n\n<details>\n  <summary>Python snippet</summary>\n\n```python\nimport torch\nfrom transformers import Mistral3ForConditionalGeneration, MistralCommonBackend\n\nmodel_id = \"mistralai/Ministral-3-14B-Reasoning-2512\"\n\ntokenizer = MistralCommonBackend.from_pretrained(model_id)\nmodel = Mistral3ForConditionalGeneration.from_pretrained(\n    model_id, torch_dtype=torch.bfloat16, device_map=\"auto\"\n)\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\ntokenized = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", return_dict=True)\n\ntokenized[\"input_ids\"] = tokenized[\"input_ids\"].to(device=\"cuda\")\ntokenized[\"pixel_values\"] = tokenized[\"pixel_values\"].to(dtype=torch.bfloat16, device=\"cuda\")\nimage_sizes = [tokenized[\"pixel_values\"].shape[-2:]]\n\noutput = model.generate(\n    **tokenized,\n    image_sizes=image_sizes,\n    max_new_tokens=8092,\n)[0]\n\ndecoded_output = tokenizer.decode(output[len(tokenized[\"input_ids\"][0]):])\nprint(decoded_output)\n```\n\n</details>\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*",
    "related_quantizations": []
  },
  "tags": [
    "vllm",
    "gguf",
    "mistral-common",
    "mistral",
    "unsloth",
    "en",
    "fr",
    "es",
    "de",
    "it",
    "pt",
    "nl",
    "zh",
    "ja",
    "ko",
    "ar",
    "base_model:mistralai/Ministral-3-14B-Reasoning-2512",
    "base_model:quantized:mistralai/Ministral-3-14B-Reasoning-2512",
    "license:apache-2.0",
    "region:us",
    "conversational"
  ],
  "likes": 44,
  "downloads": 20972,
  "gated": false,
  "private": false,
  "last_modified": "2025-12-04T15:22:12.000Z",
  "created_at": "2025-12-02T11:09:17.000Z",
  "pipeline_tag": "",
  "library_name": "vllm"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "692ec8dd645b17fbc34cc271",
  "id": "unsloth/Ministral-3-14B-Reasoning-2512-GGUF",
  "modelId": "unsloth/Ministral-3-14B-Reasoning-2512-GGUF",
  "sha": "2d687ca0523d4d87469835433b53642dfae83152",
  "createdAt": "2025-12-02T11:09:17.000Z",
  "lastModified": "2025-12-04T15:22:12.000Z",
  "author": "unsloth",
  "downloads": 20972,
  "likes": 44,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "vllm",
  "siblings_count": 32
}