jeney/ministral-3-3b-reasoning-2512-gguf Q4_K_M GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

jeney/ministral-3-3b-reasoning-2512-gguf overview

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.

vllmggufmistral-commonmistralunslothenfresdeitptnlzhjakoarbase_model:mistralai/Ministral-3-3B-Reasoning-2512base_model:quantized:mistralai/Ministral-3-3B-Reasoning-2512license:apache-2.0region:usconversational

jeney/ministral-3-3b-reasoning-2512-gguf visual

Downloads

2,258

Likes

Pipeline

—

Library

vllm

Visibility

Public

Access

Open

Repository Files & Downloads

29 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Ministral-3-3B-Reasoning-2512-BF16.gguf	GGUF	BF16	6.39 GB	Download
Ministral-3-3B-Reasoning-2512-IQ4_NL.gguf	GGUF	IQ4_NL	1.91 GB	Download
Ministral-3-3B-Reasoning-2512-IQ4_XS.gguf	GGUF	IQ4_XS	1.82 GB	Download
Ministral-3-3B-Reasoning-2512-Q2_K.gguf	GGUF	Q2_K	1.36 GB	Download
Ministral-3-3B-Reasoning-2512-Q2_K_L.gguf	GGUF	Q2_K_L	1.36 GB	Download
Ministral-3-3B-Reasoning-2512-Q3_K_M.gguf	GGUF	Q3_K_M	1.67 GB	Download
Ministral-3-3B-Reasoning-2512-Q3_K_S.gguf	GGUF	Q3_K_S	1.53 GB	Download
Ministral-3-3B-Reasoning-2512-Q4_0.gguf	GGUF	—	1.91 GB	Download
Ministral-3-3B-Reasoning-2512-Q4_1.gguf	GGUF	—	2.08 GB	Download
Ministral-3-3B-Reasoning-2512-Q4_K_M.gguf	GGUF	Q4_K_M	2.00 GB	Download
Ministral-3-3B-Reasoning-2512-Q4_K_S.gguf	GGUF	Q4_K_S	1.91 GB	Download
Ministral-3-3B-Reasoning-2512-Q5_K_M.gguf	GGUF	Q5_K_M	2.30 GB	Download
Ministral-3-3B-Reasoning-2512-Q5_K_S.gguf	GGUF	Q5_K_S	2.25 GB	Download
Ministral-3-3B-Reasoning-2512-Q6_K.gguf	GGUF	Q6_K	2.63 GB	Download
Ministral-3-3B-Reasoning-2512-Q8_0.gguf	GGUF	—	3.40 GB	Download
Ministral-3-3B-Reasoning-2512-UD-IQ1_M.gguf	GGUF	IQ1_M	971.98 MB	Download
Ministral-3-3B-Reasoning-2512-UD-IQ1_S.gguf	GGUF	IQ1_S	924.26 MB	Download
Ministral-3-3B-Reasoning-2512-UD-IQ2_M.gguf	GGUF	IQ2_M	1.25 GB	Download
Ministral-3-3B-Reasoning-2512-UD-IQ2_XXS.gguf	GGUF	IQ2_XXS	1.03 GB	Download
Ministral-3-3B-Reasoning-2512-UD-IQ3_XXS.gguf	GGUF	IQ3_XXS	1.36 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q2_K_XL.gguf	GGUF	Q2_K_XL	1.39 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q3_K_XL.gguf	GGUF	Q3_K_XL	1.73 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q4_K_XL.gguf	GGUF	Q4_K_XL	2.04 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q5_K_XL.gguf	GGUF	Q5_K_XL	2.31 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q6_K_XL.gguf	GGUF	Q6_K_XL	2.96 GB	Download
Ministral-3-3B-Reasoning-2512-UD-Q8_K_XL.gguf	GGUF	Q8_K_XL	4.19 GB	Download
mmproj-BF16.gguf	GGUF	BF16	802.52 MB	Download
mmproj-F16.gguf	GGUF	F16	801.37 MB	Download
mmproj-F32.gguf	GGUF	F32	1.56 GB	Download

Model Details Live

Model Slug

jeney/ministral-3-3b-reasoning-2512-gguf

Author

Jeney

Pipeline Task

—

Library

vllm

Created

2026-03-19

Last Modified

2026-03-19

Gated

Private

HF SHA

d6411e815393318c60be51f8c71c152063aa9a00

License

apache-2.0

Language

en, fr, es, de, it, pt, nl, zh, ja, ko, ar

Base Model

mistralai/Ministral-3-3B-Reasoning-2512

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "library_name": "vllm",
    "language": [
      "en",
      "fr",
      "es",
      "de",
      "it",
      "pt",
      "nl",
      "zh",
      "ja",
      "ko",
      "ar"
    ],
    "license": "apache-2.0",
    "inference": false,
    "base_model": [
      "mistralai/Ministral-3-3B-Reasoning-2512"
    ],
    "extra_gated_description": "If you want to learn more about how we process your personal data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.",
    "tags": [
      "mistral-common",
      "mistral",
      "unsloth"
    ],
    "frontmatter": {
      "library_name": "vllm",
      "language": [
        "en",
        "fr",
        "es",
        "de",
        "it",
        "pt",
        "nl",
        "zh",
        "ja",
        "ko",
        "ar"
      ],
      "license": "apache-2.0",
      "inference": "false",
      "base_model": [
        "mistralai/Ministral-3-3B-Reasoning-2512"
      ],
      "extra_gated_description": ">-",
      "tags": [
        "mistral-common",
        "mistral",
        "unsloth"
      ]
    },
    "hero_image_url": "https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png",
    "summary": "The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlibrary_name: vllm\nlanguage:\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- zh\n- ja\n- ko\n- ar\nlicense: apache-2.0\ninference: false\nbase_model:\n- mistralai/Ministral-3-3B-Reasoning-2512\nextra_gated_description: >-\n  If you want to learn more about how we process your personal data, please read\n  our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.\ntags:\n- mistral-common\n- mistral\n- unsloth\n---\n<div>\n  <p style=\"margin-bottom: 0; margin-top: 0;\">\n    <strong>See our <a href=\"https://huggingface.co/collections/unsloth/ministral-3\">Ministral 3 collection</a> for all versions including GGUF, 4-bit & FP8 formats.</strong>\n  </p>\n  <p style=\"margin-bottom: 0;\">\n    <em>Learn to run Ministral correctly - <a href=\"https://docs.unsloth.ai/new/ministral-3\">Read our Guide</a>.</em>\n  </p>\n<p style=\"margin-top: 0;margin-bottom: 0;\">\n   <em>See <a href=\"https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf\">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>\n  </p>\n  <div style=\"display: flex; gap: 5px; align-items: center; \">\n    <a href=\"https://github.com/unslothai/unsloth/\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"133\">\n    </a>\n    <a href=\"https://discord.gg/unsloth\">\n      <img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png\" width=\"173\">\n    </a>\n    <a href=\"https://docs.unsloth.ai/new/ministral-3\">\n      <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png\" width=\"143\">\n    </a>\n  </div>\n<h1 style=\"margin-top: 0rem;\">✨ Read our Ministral 3 Guide <a href=\"https://docs.unsloth.ai/new/ministral-3\">here</a>!</h1>\n</div>\n\n- Fine-tune Ministral 3 for free using our [Google Colab notebook](https://docs.unsloth.ai/new/ministral-3#fine-tuning)\n- Or train Ministral 3 with reinforcement learning (GSPO) with our [free notebook](https://docs.unsloth.ai/new/ministral-3#reinforcement-learning-grpo).\n- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n---\n# Ministral 3 3B Reasoning 2512\nThe smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.\n\nThis model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.\n\nThe Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.\n\n## Key Features\nMinistral 3 3B consists of two main architectural components:\n- **3.4B Language Model**\n- **0.4B Vision Encoder**\n\nThe Ministral 3 3B Reasoning model offers the following capabilities:\n- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.\n- **System Prompt**: Maintains strong adherence and support for system prompts.\n- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.\n- **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.\n- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.\n- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.\n- **Large Context Window**: Supports a 256k context window.\n\n### Use Cases\nIdeal for lightweight, real-time applications on edge or low-resource devices, such as:\n- Image captioning\n- Text classification\n- Real-time efficient translation\n- Data extraction\n- Short content generation\n- Fine-tuning and specialization\n- And more...\n  \nBringing advanced AI capabilities to edge and distributed environments for embedded systems.\n\n## Ministral 3 Family\n\n| Model Name                     | Type               | Precision | Link                                                                                     |\n|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|\n| Ministral 3 3B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512)                |\n| Ministral 3 3B Instruct 2512   | Instruct post-trained | BF16   | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)            |\n| **Ministral 3 3B Reasoning 2512**  | **Reasoning capable**  | **BF16**      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512)           |\n| Ministral 3 8B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)                |\n| Ministral 3 8B Instruct 2512   | Instruct post-trained | BF16    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)            |\n| Ministral 3 8B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)           |\n| Ministral 3 14B Base 2512      | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512)               |\n| Ministral 3 14B Instruct 2512  | Instruct post-trained | BF16    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)           |\n| Ministral 3 14B Reasoning 2512 | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512)          |\n\nOther formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).\n\n## Benchmark Results\n\nWe compare Ministral 3 to similar sized models.\n\n### Reasoning\n\n| Model                     | AIME25      | AIME24      | GPQA Diamond | LiveCodeBench |\n|---------------------------|-------------|-------------|--------------|---------------|\n| **Ministral 3 14B**       | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u>  |\n| Qwen3-14B (Thinking)      | 0.737       | 0.837       | 0.663        | 0.593         |\n|                           |             |             |              |               |\n| **Ministral 3 8B**        | 0.787       | <u>0.860</u>| 0.668        | <u>0.616</u>  |\n| Qwen3-VL-8B-Thinking      | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580         |\n|                           |             |             |              |               |\n| **Ministral 3 3B**        | <u>0.721</u>| <u>0.775</u>| 0.534        | <u>0.548</u>  |\n| Qwen3-VL-4B-Thinking      | 0.697       | 0.729       | <u>0.601</u> | 0.513         |\n\n### Instruct\n\n| Model                     | Arena Hard  | WildBench  | MATH Maj@1  | MM MTBench       |\n|---------------------------|-------------|------------|-------------|------------------|\n| **Ministral 3 14B**       | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u>      |\n| Qwen3 14B (Non-Thinking)  | 0.427       | 65.1       | 0.870       | NOT MULTIMODAL   |\n| Gemma3-12B-Instruct       | 0.436       | 63.2       | 0.854       | 6.70             |\n|                           |             |            |             |                  |\n| **Ministral 3 8B**        | 0.509       | <u>66.8</u>| 0.876       | <u>8.08</u>      |\n| Qwen3-VL-8B-Instruct      | <u>0.528</u>| 66.3       | <u>0.946</u>| 8.00             |\n|                           |             |            |             |                  |\n| **Ministral 3 3B**        | 0.305       | <u>56.8</u>| 0.830       | 7.83             |\n| Qwen3-VL-4B-Instruct      | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u>      |\n| Qwen3-VL-2B-Instruct      | 0.163       | 42.2       | 0.786       | 6.36             |\n| Gemma3-4B-Instruct        | 0.318       | 49.1       | 0.759       | 5.23             |\n\n### Base\n\n| Model               | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |\n|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|\n| **Ministral 3 14B** | 0.742             | <u>0.676</u>    | 0.648          | 0.820             | 0.794       | 0.749           |\n| Qwen3 14B Base      | <u>0.754</u>      | 0.620           | <u>0.661</u>   | <u>0.837</u>      | <u>0.804</u>| 0.703           |\n| Gemma 3 12B Base    | 0.690             | 0.487           | 0.587          | 0.766             | 0.745       | <u>0.788</u>    |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 8B**  | <u>0.706</u>      | <u>0.626</u>    | 0.591          | 0.793             | <u>0.761</u>| <u>0.681</u>    |\n| Qwen 3 8B Base      | 0.700             | 0.576           | <u>0.596</u>   | <u>0.794</u>      | 0.760       | 0.639           |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 3B**  | 0.652             | <u>0.601</u>    | 0.511          | 0.735             | 0.707       | 0.592           |\n| Qwen 3 4B Base      | <u>0.677</u>      | 0.405           | <u>0.570</u>   | <u>0.759</u>      | <u>0.713</u>| 0.530           |\n| Gemma 3 4B Base     | 0.516             | 0.294           | 0.430          | 0.626             | 0.589       | <u>0.640</u>    |\n\n## Usage\n\nThe model can be used with the following frameworks;\n- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)\n- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)\n  \n### vLLM\n\nWe recommend using this model with [vLLM](https://github.com/vllm-project/vllm).\n\n#### Installation\n\nMake sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):\n\n```\npip install vllm --upgrade\n```\n\nDoing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).\n\nTo check:\n```\npython -c \"import mistral_common; print(mistral_common.__version__)\"\n```\n\nYou can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).\n\n#### Serve\n\nDue to their size, `Ministral-3-3B-Reasoning-2512` and `Ministral-3-8B-Reasoning-2512` can run on a single 1xH200 GPU.\n\nA simple launch command is:\n\n```bash\n\nvllm serve mistralai/Ministral-3-3B-Reasoning-2512-FP8 \\\n  --enable-auto-tool-choice --tool-call-parser mistral \\\n  --reasoning-parser mistral\n```\n\nKey parameter notes:\n\n* enable-auto-tool-choice: Required when enabling tool usage.\n* tool-call-parser mistral: Required when enabling tool usage.\n* reasoning-parser mistral: Required when enabling reasoning.\n\nAdditional flags:\n\n* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.\n* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.\n\n#### Usage of the model\n\nHere we asumme that the model `mistralai/Ministral-3-3B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.\n\n<details>\n  <summary>Vision Reasoning</summary>\n\nLet's see if the Ministral 3 model knows when to pick a fight !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    elif content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\nNow we'll make it compute some maths !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\n</details>\n\n<details>\n  <summary>Text-Only Request</summary>\n\nLet's do more maths and leave it up to the model to figure out how to achieve a result.\n\n```python\nfrom typing import Any\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nquery = \"Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24.\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\"role\": \"user\", \"content\": query}\n]\nstream = client.chat.completions.create(\n  model=model,\n  messages=messages,\n  stream=True,\n  temperature=TEMP,\n  top_p=TOP_P,\n  max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\"No answer was generated by the model, probably because the maximum number of tokens was reached.\")\n```\n\n</details>\n\n### Transformers\n\nYou can also use Ministral 3 3B Reasoning 2512 with `Transformers` !\n\nTo make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.\n\n```bash\npip install mistral-common --upgrade\n```\n\nThen load our tokenizer along with the model and generate:\n\n<details>\n  <summary>Python snippet</summary>\n\n```python\nimport torch\nfrom transformers import Mistral3ForConditionalGeneration, MistralCommonBackend\n\nmodel_id = \"mistralai/Ministral-3-3B-Reasoning-2512\"\n\ntokenizer = MistralCommonBackend.from_pretrained(model_id)\nmodel = Mistral3ForConditionalGeneration.from_pretrained(\n    model_id, torch_dtype=torch.bfloat16, device_map=\"auto\"\n)\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\ntokenized = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", return_dict=True)\n\ntokenized[\"input_ids\"] = tokenized[\"input_ids\"].to(device=\"cuda\")\ntokenized[\"pixel_values\"] = tokenized[\"pixel_values\"].to(dtype=torch.bfloat16, device=\"cuda\")\nimage_sizes = [tokenized[\"pixel_values\"].shape[-2:]]\n\noutput = model.generate(\n    **tokenized,\n    image_sizes=image_sizes,\n    max_new_tokens=8092,\n)[0]\n\ndecoded_output = tokenizer.decode(output[len(tokenized[\"input_ids\"][0]):])\nprint(decoded_output)\n```\n\n</details>\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*",
    "related_quantizations": []
  },
  "tags": [
    "vllm",
    "gguf",
    "mistral-common",
    "mistral",
    "unsloth",
    "en",
    "fr",
    "es",
    "de",
    "it",
    "pt",
    "nl",
    "zh",
    "ja",
    "ko",
    "ar",
    "base_model:mistralai/Ministral-3-3B-Reasoning-2512",
    "base_model:quantized:mistralai/Ministral-3-3B-Reasoning-2512",
    "license:apache-2.0",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 2258,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-19T23:51:47.000Z",
  "created_at": "2026-03-19T07:05:57.000Z",
  "pipeline_tag": "",
  "library_name": "vllm"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69bba0552ede643e79979363",
  "id": "Jeney/Ministral-3-3B-Reasoning-2512-GGUF",
  "modelId": "Jeney/Ministral-3-3B-Reasoning-2512-GGUF",
  "sha": "d6411e815393318c60be51f8c71c152063aa9a00",
  "createdAt": "2026-03-19T07:05:57.000Z",
  "lastModified": "2026-03-19T23:51:47.000Z",
  "author": "Jeney",
  "downloads": 2258,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "vllm",
  "siblings_count": 33
}

jeney/ministral-3-3b-reasoning-2512-gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard