aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf IQ4_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf overview
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized. Learn more in our blog post and paper.
Repository Files & Downloads
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"library_name": "vllm",
"language": [
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar"
],
"license": "apache-2.0",
"inference": false,
"base_model": [
"mistralai/Ministral-3-8B-Base-2512"
],
"extra_gated_description": "If you want to learn more about how we process your personal data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.",
"tags": [
"mistral-common",
"heretic",
"uncensored",
"decensored",
"abliterated"
],
"frontmatter": {
"library_name": "vllm",
"language": [
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar"
],
"license": "apache-2.0",
"inference": "false",
"base_model": [
"mistralai/Ministral-3-8B-Base-2512"
],
"extra_gated_description": "If you want to learn more about how we process your personal",
"tags": [
"mistral-common",
"heretic",
"uncensored",
"decensored",
"abliterated"
]
},
"hero_image_url": "",
"summary": "A balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized. Learn more in our blog post and paper.",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlibrary_name: vllm\nlanguage:\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- zh\n- ja\n- ko\n- ar\nlicense: apache-2.0\ninference: false\nbase_model:\n- mistralai/Ministral-3-8B-Base-2512\nextra_gated_description: If you want to learn more about how we process your personal\n data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.\ntags:\n- mistral-common\n- heretic\n- uncensored\n- decensored\n- abliterated\n---\n# This is a decensored version of [mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0\n\n## Abliteration parameters\n\n| Parameter | Value |\n| :-------- | :---: |\n| **direction_index** | per layer |\n| **attn.o_proj.max_weight** | 0.90 |\n| **attn.o_proj.max_weight_position** | 26.35 |\n| **attn.o_proj.min_weight** | 0.88 |\n| **attn.o_proj.min_weight_distance** | 12.34 |\n| **mlp.down_proj.max_weight** | 1.25 |\n| **mlp.down_proj.max_weight_position** | 25.09 |\n| **mlp.down_proj.min_weight** | 1.20 |\n| **mlp.down_proj.min_weight_distance** | 17.83 |\n\n## Performance\n\n| Metric | This model | Original model ([mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)) |\n| :----- | :--------: | :---------------------------: |\n| **KL divergence** | 0.3509 | 0 *(by definition)* |\n| **Refusals** | 3/100 | 96/100 |\n\n-----\n\n\n# Ministral 3 8B Reasoning 2512\nA balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities.\n\nThis model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.\n\nThe Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.\n\nLearn more in our [blog post](https://mistral.ai/news/mistral-3) and [paper](https://arxiv.org/abs/2601.08584).\n\n## Key Features\nMinistral 3 8B consists of two main architectural components:\n- **8.4B Language Model**\n- **0.4B Vision Encoder**\n\nThe Ministral 3 8B Reasoning model offers the following capabilities:\n- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.\n- **System Prompt**: Maintains strong adherence and support for system prompts.\n- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.\n- **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.\n- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.\n- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.\n- **Large Context Window**: Supports a 256k context window.\n\n### Use Cases\nPerfect for balanced performance in local or embedded systems, combining versatility with efficiency.\n- Chat interfaces in constrained environments\n- Local daily-driver AI assistant\n- Image/document description and understanding\n- Translation and content generation\n- Specialized agentic use cases\n- Fine-tuning and specialization\n- And more...\n \nBringing advanced AI capabilities to resource-constrained environments.\n\n### Recommended Settings\n\nWe recommend deploying with the following best practices:\n- System Prompt: Use our provided [system prompt](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512/blob/main/SYSTEM_PROMPT.txt), and append it to your custom system prompt to define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.\n- Multi-turn Traces: We highly recommend keeping the reasoning traces in context.\n- Sampling Parameters: Use a **temperature of 0.7** for most environments ; Different temperatures may be explored for different use cases - developers are encouraged to experiment with alternative settings.\n- Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.\n- Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.\n\n## Ministral 3 Family\n\n| Model Name | Type | Precision | Link |\n|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|\n| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |\n| Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |\n| Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |\n| Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |\n| Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |\n| **Ministral 3 8B Reasoning 2512** | **Reasoning capable** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |\n| Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |\n| Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |\n| Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |\n\nOther formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).\n\n## Benchmark Results\n\nWe compare Ministral 3 to similar sized models.\n\n### Reasoning\n\n| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |\n|---------------------------|-------------|-------------|--------------|---------------|\n| **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> |\n| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |\n| | | | | |\n| **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> |\n| Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 |\n| | | | | |\n| **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> |\n| Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 |\n\n### Instruct\n\n| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |\n|---------------------------|-------------|------------|-------------|------------------|\n| **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> |\n| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |\n| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |\n| | | | | |\n| **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> |\n| Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 |\n| | | | | |\n| **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 |\n| Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> |\n| Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |\n| Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |\n\n### Base\n\n| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |\n|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|\n| **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 |\n| Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 |\n| Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> |\n| | | | | | | |\n| **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> |\n| Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 |\n| | | | | | | |\n| **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 |\n| Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |\n| Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |\n\n## Usage\n\nThe model can be used with the following frameworks;\n- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)\n- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)\n \n### vLLM\n\nWe recommend using this model with [vLLM](https://github.com/vllm-project/vllm).\n\n#### Installation\n\nMake sure to install **vllm >= 0.12.0**:\n\n```\npip install vllm --upgrade\n```\n\nDoing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).\n\nTo check:\n```\npython -c \"import mistral_common; print(mistral_common.__version__)\"\n```\n\nYou can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).\n\n#### Serve\n\nDue to their size, `Ministral-3-3B-Reasoning-2512` and `Ministral-3-8B-Reasoning-2512` can run on a single 1xH200 GPU.\n\nA simple launch command is:\n\n```bash\n\nvllm serve mistralai/Ministral-3-8B-Reasoning-2512 \\\n --tokenizer_mode mistral --config_format mistral --load_format mistral \\\n --enable-auto-tool-choice --tool-call-parser mistral \\\n --reasoning-parser mistral\n```\n\nKey parameter notes:\n\n* enable-auto-tool-choice: Required when enabling tool usage.\n* tool-call-parser mistral: Required when enabling tool usage.\n* reasoning-parser mistral: Required when enabling reasoning.\n\nAdditional flags:\n\n* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.\n* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.\n\n#### Usage of the model\n\nHere we assume that the model `mistralai/Ministral-3-8B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.\n\n<details>\n <summary>Vision Reasoning</summary>\n\nLet's see if the Ministral 3 model knows when to pick a fight !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\n\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n elif content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\n \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n )\n```\n\nNow we'll make it compute some maths !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n if content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\n \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n )\n```\n\n</details>\n\n<details>\n <summary>Text-Only Request</summary>\n\nLet's do more maths and leave it up to the model to figure out how to achieve a result.\n\n```python\nfrom typing import Any\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n api_key=openai_api_key,\n base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n with open(file_path, \"r\") as file:\n system_prompt = file.read()\n\n index_begin_think = system_prompt.find(\"[THINK]\")\n index_end_think = system_prompt.find(\"[/THINK]\")\n\n return {\n \"role\": \"system\",\n \"content\": [\n {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n {\n \"type\": \"thinking\",\n \"thinking\": system_prompt[\n index_begin_think + len(\"[THINK]\") : index_end_think\n ],\n \"closed\": True,\n },\n {\n \"type\": \"text\",\n \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n },\n ],\n }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nquery = \"Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24.\"\n\nmessages = [\n SYSTEM_PROMPT,\n {\"role\": \"user\", \"content\": query}\n]\nstream = client.chat.completions.create(\n model=model,\n messages=messages,\n stream=True,\n temperature=TEMP,\n top_p=TOP_P,\n max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n reasoning_content = None\n content = None\n # Check the content is reasoning_content or content\n if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n reasoning_content = chunk.choices[0].delta.reasoning_content\n if hasattr(chunk.choices[0].delta, \"content\"):\n content = chunk.choices[0].delta.content\n\n if reasoning_content is not None:\n if not printed_reasoning_content:\n printed_reasoning_content = True\n print(\"Start reasoning:\\n\", end=\"\", flush=True)\n print(reasoning_content, end=\"\", flush=True)\n if content is not None:\n # Extract and print the content\n if not reasoning_content and printed_reasoning_content:\n answer.extend(content)\n print(content, end=\"\", flush=True)\n\nif answer:\n print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n print(\"\".join(answer))\nelse:\n print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n print(\"No answer was generated by the model, probably because the maximum number of tokens was reached.\")\n```\n\n</details>\n\n### Transformers\n\nYou can also use Ministral 3 3B Reasoning 2512 with `Transformers` !\nMake sure to install `Transformers` from its first v5 release candidate or from \"main\":\n\n```\npip install transformers==5.0.0rc0\n```\n\nTo make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.\n\n```bash\npip install mistral-common --upgrade\n```\n\nThen load our tokenizer along with the model and generate:\n\n<details>\n <summary>Python snippet</summary>\n\n```python\nimport torch\nfrom transformers import Mistral3ForConditionalGeneration, MistralCommonBackend\n\nmodel_id = \"mistralai/Ministral-3-8B-Reasoning-2512\"\n\ntokenizer = MistralCommonBackend.from_pretrained(model_id)\nmodel = Mistral3ForConditionalGeneration.from_pretrained(\n model_id, torch_dtype=torch.bfloat16, device_map=\"auto\"\n)\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n },\n {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n ],\n },\n]\n\ntokenized = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", return_dict=True)\n\ntokenized[\"input_ids\"] = tokenized[\"input_ids\"].to(device=\"cuda\")\ntokenized[\"pixel_values\"] = tokenized[\"pixel_values\"].to(dtype=torch.bfloat16, device=\"cuda\")\nimage_sizes = [tokenized[\"pixel_values\"].shape[-2:]]\n\noutput = model.generate(\n **tokenized,\n image_sizes=image_sizes,\n max_new_tokens=8092,\n)[0]\n\ndecoded_output = tokenizer.decode(output[len(tokenized[\"input_ids\"][0]):])\nprint(decoded_output)\n```\n\n</details>\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*",
"related_quantizations": []
},
"tags": [
"vllm",
"gguf",
"mistral3",
"mistral-common",
"heretic",
"uncensored",
"decensored",
"abliterated",
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"zh",
"ja",
"ko",
"ar",
"arxiv:2601.08584",
"base_model:mistralai/Ministral-3-8B-Base-2512",
"base_model:quantized:mistralai/Ministral-3-8B-Base-2512",
"license:apache-2.0",
"region:us",
"imatrix",
"conversational"
],
"likes": 0,
"downloads": 624,
"gated": false,
"private": false,
"last_modified": "2026-04-07T02:13:33.000Z",
"created_at": "2026-04-07T02:09:14.000Z",
"pipeline_tag": "",
"library_name": "vllm"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69d4674adcfb2307467a9bdd",
"id": "AISafety-Student/Ministral-3-8B-Reasoning-2512-heretic_GGUF",
"modelId": "AISafety-Student/Ministral-3-8B-Reasoning-2512-heretic_GGUF",
"sha": "afdb89b25fc34eabca1432f69e3e4531f5743d78",
"createdAt": "2026-04-07T02:09:14.000Z",
"lastModified": "2026-04-07T02:13:33.000Z",
"author": "AISafety-Student",
"downloads": 624,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "",
"library_name": "vllm",
"siblings_count": 9
}