jeney/ministral-3-3b-reasoning-2512-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
jeney/ministral-3-3b-reasoning-2512-gguf overview
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| Ministral-3-3B-Reasoning-2512-BF16.gguf | GGUF | BF16 | 6.39 GB | Download |
| Ministral-3-3B-Reasoning-2512-IQ4_NL.gguf | GGUF | IQ4_NL | 1.91 GB | Download |
| Ministral-3-3B-Reasoning-2512-IQ4_XS.gguf | GGUF | IQ4_XS | 1.82 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q2_K.gguf | GGUF | Q2_K | 1.36 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q2_K_L.gguf | GGUF | Q2_K_L | 1.36 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q3_K_M.gguf | GGUF | Q3_K_M | 1.67 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q3_K_S.gguf | GGUF | Q3_K_S | 1.53 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q4_0.gguf | GGUF | — | 1.91 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q4_1.gguf | GGUF | — | 2.08 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q4_K_M.gguf | GGUF | Q4_K_M | 2.00 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q4_K_S.gguf | GGUF | Q4_K_S | 1.91 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q5_K_M.gguf | GGUF | Q5_K_M | 2.30 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q5_K_S.gguf | GGUF | Q5_K_S | 2.25 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q6_K.gguf | GGUF | Q6_K | 2.63 GB | Download |
| Ministral-3-3B-Reasoning-2512-Q8_0.gguf | GGUF | — | 3.40 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-IQ1_M.gguf | GGUF | IQ1_M | 971.98 MB | Download |
| Ministral-3-3B-Reasoning-2512-UD-IQ1_S.gguf | GGUF | IQ1_S | 924.26 MB | Download |
| Ministral-3-3B-Reasoning-2512-UD-IQ2_M.gguf | GGUF | IQ2_M | 1.25 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-IQ2_XXS.gguf | GGUF | IQ2_XXS | 1.03 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-IQ3_XXS.gguf | GGUF | IQ3_XXS | 1.36 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q2_K_XL.gguf | GGUF | Q2_K_XL | 1.39 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q3_K_XL.gguf | GGUF | Q3_K_XL | 1.73 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q4_K_XL.gguf | GGUF | Q4_K_XL | 2.04 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q5_K_XL.gguf | GGUF | Q5_K_XL | 2.31 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q6_K_XL.gguf | GGUF | Q6_K_XL | 2.96 GB | Download |
| Ministral-3-3B-Reasoning-2512-UD-Q8_K_XL.gguf | GGUF | Q8_K_XL | 4.19 GB | Download |
| mmproj-BF16.gguf | GGUF | BF16 | 802.52 MB | Download |
| mmproj-F16.gguf | GGUF | F16 | 801.37 MB | Download |
| mmproj-F32.gguf | GGUF | F32 | 1.56 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"library_name": "vllm",
"language": [
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar"
],
"license": "apache-2.0",
"inference": false,
"base_model": [
"mistralai/Ministral-3-3B-Reasoning-2512"
],
"extra_gated_description": "If you want to learn more about how we process your personal data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.",
"tags": [
"mistral-common",
"mistral",
"unsloth"
],
"frontmatter": {
"library_name": "vllm",
"language": [
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar"
],
"license": "apache-2.0",
"inference": "false",
"base_model": [
"mistralai/Ministral-3-3B-Reasoning-2512"
],
"extra_gated_description": ">-",
"tags": [
"mistral-common",
"mistral",
"unsloth"
]
},
"hero_image_url": "https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png",
"summary": "The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlibrary_name: vllm\nlanguage:\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- zh\n- ja\n- ko\n- ar\nlicense: apache-2.0\ninference: false\nbase_model:\n- mistralai/Ministral-3-3B-Reasoning-2512\nextra_gated_description: >-\n If you want to learn more about how we process your personal data, please read\n our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.\ntags:\n- mistral-common\n- mistral\n- unsloth\n---\n<div>\n <p style=\"margin-bottom: 0; margin-top: 0;\">\n <strong>See our <a href=\"https://huggingface.co/collections/unsloth/ministral-3\">Ministral 3 collection</a> for all versions including GGUF, 4-bit & FP8 formats.</strong>\n </p>\n <p style=\"margin-bottom: 0;\">\n <em>Learn to run Ministral correctly - <a href=\"https://docs.unsloth.ai/new/ministral-3\">Read our Guide</a>.</em>\n </p>\n<p style=\"margin-top: 0;margin-bottom: 0;\">\n <em>See <a href=\"https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf\">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>\n </p>\n <div style=\"display: flex; gap: 5px; align-items: center; \">\n <a href=\"https://github.com/unslothai/unsloth/\">\n <img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"133\">\n </a>\n <a href=\"https://discord.gg/unsloth\">\n <img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png\" width=\"173\">\n </a>\n <a href=\"https://docs.unsloth.ai/new/ministral-3\">\n <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png\" width=\"143\">\n </a>\n </div>\n<h1 style=\"margin-top: 0rem;\">✨ Read our Ministral 3 Guide <a href=\"https://docs.unsloth.ai/new/ministral-3\">here</a>!</h1>\n</div>\n\n- Fine-tune Ministral 3 for free using our [Google Colab notebook](https://docs.unsloth.ai/new/ministral-3#fine-tuning)\n- Or train Ministral 3 with reinforcement learning (GSPO) with our [free notebook](https://docs.unsloth.ai/new/ministral-3#reinforcement-learning-grpo).\n- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n---\n# Ministral 3 3B Reasoning 2512\nThe smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.\n\nThis model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.\n\nThe Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.\n\n## Key Features\nMinistral 3 3B consists of two main architectural components:\n- **3.4B Language Model**\n- **0.4B Vision Encoder**\n\nThe Ministral 3 3B Reasoning model offers the following capabilities:\n- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.\n- **System Prompt**: Maintains strong adherence and support for system prompts.\n- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.\n- **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.\n- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.\n- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.\n- **Large Context Window**: Supports a 256k context window.\n\n### Use Cases\nIdeal for lightweight, real-time applications on edge or low-resource devices, such as:\n- Image captioning\n- Text classification\n- Real-time efficient translation\n- Data extraction\n- Short content generation\n- Fine-tuning and specialization\n- And more...\n \nBringing advanced AI capabilities to edge and distributed environments for embedded systems.\n\n## Ministral 3 Family\n\n| Model Name | Type | Precision | Link |\n|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|\n| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |\n| Ministral 3 3B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |\n| **Ministral 3 3B Reasoning 2512** | **Reasoning capable** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |\n| Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |\n| Ministral 3 8B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |\n| Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |\n| Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |\n| Ministral 3 14B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |\n| Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |\n\nOther formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).\n\n## Benchmark Results\n\nWe compare Ministral 3 to similar sized models.\n\n### Reasoning\n\n| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |\n|---------------------------|-------------|-------------|--------------|---------------|\n| **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> |\n| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |\n| | | | | |\n| **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> |\n| Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 |\n| | | | | |\n| **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> |\n| Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 |\n\n### Instruct\n\n| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |\n|---------------------------|-------------|------------|-------------|------------------|\n| **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> |\n| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |\n| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |\n| | | | | |\n| **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> |\n| Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 |\n| | | | | |\n| **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 |\n| Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> |\n| Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |\n| Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |\n\n### Base\n\n| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |\n|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|\n| **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 |\n| Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 |\n| Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> |\n| | | | | | | |\n| **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> |\n| Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 |\n| | | | | | | |\n| **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 |\n| Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |\n| Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |\n\n## Usage\n\nThe model can be used with the following frameworks;\n- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)\n- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)\n \n### vLLM\n\nWe recommend using this model with [vLLM](https://github.com/vllm-project/vllm).\n\n#### Installation\n\nMake sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):\n\n```\npip install vllm --upgrade\n```\n\nDoing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).\n\nTo check:\n```\npython -c \"import mistral_common; print(mistral_common.__version__)\"\n```\n\nYou can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).\n\n#### Serve\n\nDue to their size, `Ministral-3-3B-Reasoning-2512` and `Ministral-3-8B-Reasoning-2512` can run on a single 1xH200 GPU.\n\nA simple launch command is:\n\n```bash\n\nvllm serve mistralai/Ministral-3-3B-Reasoning-2512-FP8 \\\n --enable-auto-tool-choice --tool-call-parser mistral \\\n --reasoning-parser mistral\n```\n\nKey parameter notes:\n\n* enable-auto-tool-choice: Required when enabling tool usage.\n* tool-call-parser mistral: Required when enabling tool usage.\n* reasoning-parser mistral: Required when enabling reasoning.\n\nAdditional flags:\n\n* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.\n* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.\n\n#### Usage of the model\n\nHere we asumme that the model `mistralai/Ministral-3-3B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.\n\n<details>\n <summary>Vision Reasoning</summary>\n\nLet's see if the Ministral 3 model knows when to pick a fight !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\n\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n elif content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\n \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n )\n```\n\nNow we'll make it compute some maths !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n if content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\n \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n )\n```\n\n</details>\n\n<details>\n <summary>Text-Only Request</summary>\n\nLet's do more maths and leave it up to the model to figure out how to achieve a result.\n\n```python\nfrom typing import Any\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nquery = \"Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24.\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\"role\": \"user\", \"content\": query}\n]\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n if content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\"No answer was generated by the model, probably because the maximum number of tokens was reached.\")\n```\n\n</details>\n\n### Transformers\n\nYou can also use Ministral 3 3B Reasoning 2512 with `Transformers` !\n\nTo make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.\n\n```bash\npip install mistral-common --upgrade\n```\n\nThen load our tokenizer along with the model and generate:\n\n<details>\n <summary>Python snippet</summary>\n\n```python\nimport torch\nfrom transformers import Mistral3ForConditionalGeneration, MistralCommonBackend\n\nmodel_id = \"mistralai/Ministral-3-3B-Reasoning-2512\"\n\ntokenizer = MistralCommonBackend.from_pretrained(model_id)\nmodel = Mistral3ForConditionalGeneration.from_pretrained(\n model_id, torch_dtype=torch.bfloat16, device_map=\"auto\"\n)\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\ntokenized = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", return_dict=True)\n\ntokenized[\"input_ids\"] = tokenized[\"input_ids\"].to(device=\"cuda\")\ntokenized[\"pixel_values\"] = tokenized[\"pixel_values\"].to(dtype=torch.bfloat16, device=\"cuda\")\nimage_sizes = [tokenized[\"pixel_values\"].shape[-2:]]\n\noutput = model.generate(\n **tokenized,\n image_sizes=image_sizes,\n max_new_tokens=8092,\n)[0]\n\ndecoded_output = tokenizer.decode(output[len(tokenized[\"input_ids\"][0]):])\nprint(decoded_output)\n```\n\n</details>\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*",
"related_quantizations": []
},
"tags": [
"vllm",
"gguf",
"mistral-common",
"mistral",
"unsloth",
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar",
"base_model:mistralai/Ministral-3-3B-Reasoning-2512",
"base_model:quantized:mistralai/Ministral-3-3B-Reasoning-2512",
"license:apache-2.0",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 2258,
"gated": false,
"private": false,
"last_modified": "2026-03-19T23:51:47.000Z",
"created_at": "2026-03-19T07:05:57.000Z",
"pipeline_tag": "",
"library_name": "vllm"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69bba0552ede643e79979363",
"id": "Jeney/Ministral-3-3B-Reasoning-2512-GGUF",
"modelId": "Jeney/Ministral-3-3B-Reasoning-2512-GGUF",
"sha": "d6411e815393318c60be51f8c71c152063aa9a00",
"createdAt": "2026-03-19T07:05:57.000Z",
"lastModified": "2026-03-19T23:51:47.000Z",
"author": "Jeney",
"downloads": 2258,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "vllm",
"siblings_count": 33
}