aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf IQ4_XS GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.

Model Intelligence Sheet

aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf overview

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized. Learn more in our blog post and paper.

vllmggufmistral3mistral-commonhereticuncensoreddecensoredabliteratedenfresdeitptnlzhjakoararxiv:2601.08584base_model:mistralai/Ministral-3-8B-Base-2512base_model:quantized:mistralai/Ministral-3-8B-Base-2512license:apache-2.0region:usimatrixconversational

aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf visual

Downloads

624

Likes

Pipeline

—

Library

vllm

Visibility

Public

Access

Open

Repository Files & Downloads

2 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
ministral-heretic-IQ4_XS.gguf	GGUF	IQ4_XS	4.37 GB	Download
ministral-heretic-Q4_K_M.gguf	GGUF	Q4_K_M	4.84 GB	Download

Model Details Live

Model Slug

aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf

Author

AISafety-Student

Pipeline Task

—

Library

vllm

Created

2026-04-07

Last Modified

2026-04-07

Gated

Private

HF SHA

afdb89b25fc34eabca1432f69e3e4531f5743d78

License

apache-2.0

Language

en, fr, es, de, it, pt, nl, zh, ja, ko, ar

Base Model

mistralai/Ministral-3-8B-Base-2512

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "library_name": "vllm",
    "language": [
      "en",
      "fr",
      "es",
      "de",
      "it",
      "pt",
      "nl",
      "zh",
      "ja",
      "ko",
      "ar"
    ],
    "license": "apache-2.0",
    "inference": false,
    "base_model": [
      "mistralai/Ministral-3-8B-Base-2512"
    ],
    "extra_gated_description": "If you want to learn more about how we process your personal data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.",
    "tags": [
      "mistral-common",
      "heretic",
      "uncensored",
      "decensored",
      "abliterated"
    ],
    "frontmatter": {
      "library_name": "vllm",
      "language": [
        "en",
        "fr",
        "es",
        "de",
        "it",
        "pt",
        "nl",
        "zh",
        "ja",
        "ko",
        "ar"
      ],
      "license": "apache-2.0",
      "inference": "false",
      "base_model": [
        "mistralai/Ministral-3-8B-Base-2512"
      ],
      "extra_gated_description": "If you want to learn more about how we process your personal",
      "tags": [
        "mistral-common",
        "heretic",
        "uncensored",
        "decensored",
        "abliterated"
      ]
    },
    "hero_image_url": "",
    "summary": "A balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized. Learn more in our blog post and paper.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlibrary_name: vllm\nlanguage:\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- zh\n- ja\n- ko\n- ar\nlicense: apache-2.0\ninference: false\nbase_model:\n- mistralai/Ministral-3-8B-Base-2512\nextra_gated_description: If you want to learn more about how we process your personal\n  data, please read our <a href=\"https://mistral.ai/terms/\">Privacy Policy</a>.\ntags:\n- mistral-common\n- heretic\n- uncensored\n- decensored\n- abliterated\n---\n# This is a decensored version of [mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0\n\n## Abliteration parameters\n\n| Parameter | Value |\n| :-------- | :---: |\n| **direction_index** | per layer |\n| **attn.o_proj.max_weight** | 0.90 |\n| **attn.o_proj.max_weight_position** | 26.35 |\n| **attn.o_proj.min_weight** | 0.88 |\n| **attn.o_proj.min_weight_distance** | 12.34 |\n| **mlp.down_proj.max_weight** | 1.25 |\n| **mlp.down_proj.max_weight_position** | 25.09 |\n| **mlp.down_proj.min_weight** | 1.20 |\n| **mlp.down_proj.min_weight_distance** | 17.83 |\n\n## Performance\n\n| Metric | This model | Original model ([mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)) |\n| :----- | :--------: | :---------------------------: |\n| **KL divergence** | 0.3509 | 0 *(by definition)* |\n| **Refusals** | 3/100 | 96/100 |\n\n-----\n\n\n# Ministral 3 8B Reasoning 2512\nA balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities.\n\nThis model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.\n\nThe Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.\n\nLearn more in our [blog post](https://mistral.ai/news/mistral-3) and [paper](https://arxiv.org/abs/2601.08584).\n\n## Key Features\nMinistral 3 8B consists of two main architectural components:\n- **8.4B Language Model**\n- **0.4B Vision Encoder**\n\nThe Ministral 3 8B Reasoning model offers the following capabilities:\n- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.\n- **System Prompt**: Maintains strong adherence and support for system prompts.\n- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.\n- **Reasoning**: Excels at complex, multi-step reasoning and dynamic problem-solving.\n- **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.\n- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.\n- **Large Context Window**: Supports a 256k context window.\n\n### Use Cases\nPerfect for balanced performance in local or embedded systems, combining versatility with efficiency.\n- Chat interfaces in constrained environments\n- Local daily-driver AI assistant\n- Image/document description and understanding\n- Translation and content generation\n- Specialized agentic use cases\n- Fine-tuning and specialization\n- And more...\n  \nBringing advanced AI capabilities to resource-constrained environments.\n\n### Recommended Settings\n\nWe recommend deploying with the following best practices:\n- System Prompt: Use our provided [system prompt](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512/blob/main/SYSTEM_PROMPT.txt), and append it to your custom system prompt to define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.\n- Multi-turn Traces: We highly recommend keeping the reasoning traces in context.\n- Sampling Parameters: Use a **temperature of 0.7** for most environments ; Different temperatures may be explored for different use cases - developers are encouraged to experiment with alternative settings.\n- Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.\n- Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.\n\n## Ministral 3 Family\n\n| Model Name                     | Type               | Precision | Link                                                                                     |\n|--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|\n| Ministral 3 3B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512)                |\n| Ministral 3 3B Instruct 2512   | Instruct post-trained | FP8   | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)            |\n| Ministral 3 3B Reasoning 2512  | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512)           |\n| Ministral 3 8B Base 2512       | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512)                |\n| Ministral 3 8B Instruct 2512   | Instruct post-trained | FP8    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512)            |\n| **Ministral 3 8B Reasoning 2512**  | **Reasoning capable**  | **BF16**      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512)           |\n| Ministral 3 14B Base 2512      | Base pre-trained   | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512)               |\n| Ministral 3 14B Instruct 2512  | Instruct post-trained | FP8    | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512)           |\n| Ministral 3 14B Reasoning 2512 | Reasoning capable  | BF16      | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512)          |\n\nOther formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).\n\n## Benchmark Results\n\nWe compare Ministral 3 to similar sized models.\n\n### Reasoning\n\n| Model                     | AIME25      | AIME24      | GPQA Diamond | LiveCodeBench |\n|---------------------------|-------------|-------------|--------------|---------------|\n| **Ministral 3 14B**       | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u>  |\n| Qwen3-14B (Thinking)      | 0.737       | 0.837       | 0.663        | 0.593         |\n|                           |             |             |              |               |\n| **Ministral 3 8B**        | 0.787       | <u>0.860</u>| 0.668        | <u>0.616</u>  |\n| Qwen3-VL-8B-Thinking      | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580         |\n|                           |             |             |              |               |\n| **Ministral 3 3B**        | <u>0.721</u>| <u>0.775</u>| 0.534        | <u>0.548</u>  |\n| Qwen3-VL-4B-Thinking      | 0.697       | 0.729       | <u>0.601</u> | 0.513         |\n\n### Instruct\n\n| Model                     | Arena Hard  | WildBench  | MATH Maj@1  | MM MTBench       |\n|---------------------------|-------------|------------|-------------|------------------|\n| **Ministral 3 14B**       | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u>      |\n| Qwen3 14B (Non-Thinking)  | 0.427       | 65.1       | 0.870       | NOT MULTIMODAL   |\n| Gemma3-12B-Instruct       | 0.436       | 63.2       | 0.854       | 6.70             |\n|                           |             |            |             |                  |\n| **Ministral 3 8B**        | 0.509       | <u>66.8</u>| 0.876       | <u>8.08</u>      |\n| Qwen3-VL-8B-Instruct      | <u>0.528</u>| 66.3       | <u>0.946</u>| 8.00             |\n|                           |             |            |             |                  |\n| **Ministral 3 3B**        | 0.305       | <u>56.8</u>| 0.830       | 7.83             |\n| Qwen3-VL-4B-Instruct      | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u>      |\n| Qwen3-VL-2B-Instruct      | 0.163       | 42.2       | 0.786       | 6.36             |\n| Gemma3-4B-Instruct        | 0.318       | 49.1       | 0.759       | 5.23             |\n\n### Base\n\n| Model               | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |\n|---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|\n| **Ministral 3 14B** | 0.742             | <u>0.676</u>    | 0.648          | 0.820             | 0.794       | 0.749           |\n| Qwen3 14B Base      | <u>0.754</u>      | 0.620           | <u>0.661</u>   | <u>0.837</u>      | <u>0.804</u>| 0.703           |\n| Gemma 3 12B Base    | 0.690             | 0.487           | 0.587          | 0.766             | 0.745       | <u>0.788</u>    |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 8B**  | <u>0.706</u>      | <u>0.626</u>    | 0.591          | 0.793             | <u>0.761</u>| <u>0.681</u>    |\n| Qwen 3 8B Base      | 0.700             | 0.576           | <u>0.596</u>   | <u>0.794</u>      | 0.760       | 0.639           |\n|                     |                   |                 |                |                   |             |                 |\n| **Ministral 3 3B**  | 0.652             | <u>0.601</u>    | 0.511          | 0.735             | 0.707       | 0.592           |\n| Qwen 3 4B Base      | <u>0.677</u>      | 0.405           | <u>0.570</u>   | <u>0.759</u>      | <u>0.713</u>| 0.530           |\n| Gemma 3 4B Base     | 0.516             | 0.294           | 0.430          | 0.626             | 0.589       | <u>0.640</u>    |\n\n## Usage\n\nThe model can be used with the following frameworks;\n- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)\n- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)\n  \n### vLLM\n\nWe recommend using this model with [vLLM](https://github.com/vllm-project/vllm).\n\n#### Installation\n\nMake sure to install **vllm >= 0.12.0**:\n\n```\npip install vllm --upgrade\n```\n\nDoing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).\n\nTo check:\n```\npython -c \"import mistral_common; print(mistral_common.__version__)\"\n```\n\nYou can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).\n\n#### Serve\n\nDue to their size, `Ministral-3-3B-Reasoning-2512` and `Ministral-3-8B-Reasoning-2512` can run on a single 1xH200 GPU.\n\nA simple launch command is:\n\n```bash\n\nvllm serve mistralai/Ministral-3-8B-Reasoning-2512 \\\n  --tokenizer_mode mistral --config_format mistral --load_format mistral \\\n  --enable-auto-tool-choice --tool-call-parser mistral \\\n  --reasoning-parser mistral\n```\n\nKey parameter notes:\n\n* enable-auto-tool-choice: Required when enabling tool usage.\n* tool-call-parser mistral: Required when enabling tool usage.\n* reasoning-parser mistral: Required when enabling reasoning.\n\nAdditional flags:\n\n* You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.\n* You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.\n\n#### Usage of the model\n\nHere we assume that the model `mistralai/Ministral-3-8B-Reasoning-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.\n\n<details>\n  <summary>Vision Reasoning</summary>\n\nLet's see if the Ministral 3 model knows when to pick a fight !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    elif content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\nNow we'll make it compute some maths !\n\n```python\nfrom typing import Any\n\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nimage_url = \"https://i.ytimg.com/vi/5Y3xLHeyKZU/hqdefault.jpg\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"Solve the equations. If they contain only numbers, use your calculator, else only think. Answer in the language of the image.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\nstream = client.chat.completions.create(\n    model=model,\n    messages=messages,\n    stream=True,\n    temperature=TEMP,\n    top_p=TOP_P,\n    max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\n        \"No answer was generated by the model, probably because the maximum number of tokens was reached.\"\n    )\n```\n\n</details>\n\n<details>\n  <summary>Text-Only Request</summary>\n\nLet's do more maths and leave it up to the model to figure out how to achieve a result.\n\n```python\nfrom typing import Any\nfrom openai import OpenAI\nfrom huggingface_hub import hf_hub_download\n\n# Modify OpenAI's API key and API base to use vLLM's API server.\nopenai_api_key = \"EMPTY\"\nopenai_api_base = \"http://localhost:8000/v1\"\n\nTEMP = 0.7\nTOP_P = 0.95\nMAX_TOK = 262144\nclient = OpenAI(\n    api_key=openai_api_key,\n    base_url=openai_api_base,\n)\n\nmodels = client.models.list()\nmodel = models.data[0].id\n\n\ndef load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:\n    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\n    with open(file_path, \"r\") as file:\n        system_prompt = file.read()\n\n    index_begin_think = system_prompt.find(\"[THINK]\")\n    index_end_think = system_prompt.find(\"[/THINK]\")\n\n    return {\n        \"role\": \"system\",\n        \"content\": [\n            {\"type\": \"text\", \"text\": system_prompt[:index_begin_think]},\n            {\n                \"type\": \"thinking\",\n                \"thinking\": system_prompt[\n                    index_begin_think + len(\"[THINK]\") : index_end_think\n                ],\n                \"closed\": True,\n            },\n            {\n                \"type\": \"text\",\n                \"text\": system_prompt[index_end_think + len(\"[/THINK]\") :],\n            },\n        ],\n    }\n\n\nSYSTEM_PROMPT = load_system_prompt(model, \"SYSTEM_PROMPT.txt\")\n\nquery = \"Use each number in 2,5,6,3 exactly once, along with any combination of +, -, ×, ÷ (and parentheses for grouping), to make the number 24.\"\n\nmessages = [\n    SYSTEM_PROMPT,\n    {\"role\": \"user\", \"content\": query}\n]\nstream = client.chat.completions.create(\n  model=model,\n  messages=messages,\n  stream=True,\n  temperature=TEMP,\n  top_p=TOP_P,\n  max_tokens=MAX_TOK,\n)\n\nprint(\"client: Start streaming chat completions...:\\n\")\nprinted_reasoning_content = False\nanswer = []\n\nfor chunk in stream:\n    reasoning_content = None\n    content = None\n    # Check the content is reasoning_content or content\n    if hasattr(chunk.choices[0].delta, \"reasoning_content\"):\n        reasoning_content = chunk.choices[0].delta.reasoning_content\n    if hasattr(chunk.choices[0].delta, \"content\"):\n        content = chunk.choices[0].delta.content\n\n    if reasoning_content is not None:\n        if not printed_reasoning_content:\n            printed_reasoning_content = True\n            print(\"Start reasoning:\\n\", end=\"\", flush=True)\n        print(reasoning_content, end=\"\", flush=True)\n    if content is not None:\n        # Extract and print the content\n        if not reasoning_content and printed_reasoning_content:\n            answer.extend(content)\n        print(content, end=\"\", flush=True)\n\nif answer:\n    print(\"\\n\\n=============\\nAnswer\\n=============\\n\")\n    print(\"\".join(answer))\nelse:\n    print(\"\\n\\n=============\\nNo Answer\\n=============\\n\")\n    print(\"No answer was generated by the model, probably because the maximum number of tokens was reached.\")\n```\n\n</details>\n\n### Transformers\n\nYou can also use Ministral 3 3B Reasoning 2512 with `Transformers` !\nMake sure to install `Transformers` from its first v5 release candidate or from \"main\":\n\n```\npip install transformers==5.0.0rc0\n```\n\nTo make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.\n\n```bash\npip install mistral-common --upgrade\n```\n\nThen load our tokenizer along with the model and generate:\n\n<details>\n  <summary>Python snippet</summary>\n\n```python\nimport torch\nfrom transformers import Mistral3ForConditionalGeneration, MistralCommonBackend\n\nmodel_id = \"mistralai/Ministral-3-8B-Reasoning-2512\"\n\ntokenizer = MistralCommonBackend.from_pretrained(model_id)\nmodel = Mistral3ForConditionalGeneration.from_pretrained(\n    model_id, torch_dtype=torch.bfloat16, device_map=\"auto\"\n)\n\nimage_url = \"https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438\"\n\nmessages = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\n                \"type\": \"text\",\n                \"text\": \"What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.\",\n            },\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n        ],\n    },\n]\n\ntokenized = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", return_dict=True)\n\ntokenized[\"input_ids\"] = tokenized[\"input_ids\"].to(device=\"cuda\")\ntokenized[\"pixel_values\"] = tokenized[\"pixel_values\"].to(dtype=torch.bfloat16, device=\"cuda\")\nimage_sizes = [tokenized[\"pixel_values\"].shape[-2:]]\n\noutput = model.generate(\n    **tokenized,\n    image_sizes=image_sizes,\n    max_new_tokens=8092,\n)[0]\n\ndecoded_output = tokenizer.decode(output[len(tokenized[\"input_ids\"][0]):])\nprint(decoded_output)\n```\n\n</details>\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*",
    "related_quantizations": []
  },
  "tags": [
    "vllm",
    "gguf",
    "mistral3",
    "mistral-common",
    "heretic",
    "uncensored",
    "decensored",
    "abliterated",
    "en",
    "fr",
    "es",
    "de",
    "it",
    "pt",
    "nl",
    "zh",
    "ja",
    "ko",
    "ar",
    "arxiv:2601.08584",
    "base_model:mistralai/Ministral-3-8B-Base-2512",
    "base_model:quantized:mistralai/Ministral-3-8B-Base-2512",
    "license:apache-2.0",
    "region:us",
    "imatrix",
    "conversational"
  ],
  "likes": 0,
  "downloads": 624,
  "gated": false,
  "private": false,
  "last_modified": "2026-04-07T02:13:33.000Z",
  "created_at": "2026-04-07T02:09:14.000Z",
  "pipeline_tag": "",
  "library_name": "vllm"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69d4674adcfb2307467a9bdd",
  "id": "AISafety-Student/Ministral-3-8B-Reasoning-2512-heretic_GGUF",
  "modelId": "AISafety-Student/Ministral-3-8B-Reasoning-2512-heretic_GGUF",
  "sha": "afdb89b25fc34eabca1432f69e3e4531f5743d78",
  "createdAt": "2026-04-07T02:09:14.000Z",
  "lastModified": "2026-04-07T02:13:33.000Z",
  "author": "AISafety-Student",
  "downloads": 624,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "",
  "library_name": "vllm",
  "siblings_count": 9
}

aisafety-student/ministral-3-8b-reasoning-2512-heretic_gguf overview

Repository Files & Downloads

Model Details Live

Metadata Inspector

More models in this shard