alexanderkyng/mistral-small-119b-2603-ik-gguf IQ4_K_R GGUF - Free GGUF Download is indexed on GraySoft with repository links, GGUF quant files, and Hugging Face metadata. This page helps you pick a local model for guIDE or other runtimes. See related models in the same shard below.
Model Intelligence Sheet
alexanderkyng/mistral-small-119b-2603-ik-gguf overview
This repository provides ik_llama optimized GGUF quantizations for the mistralai/Mistral-Small-4-119B-2603 model.
Downloads
1,599
Likes
0
Pipeline
text-generation
Library
—
Visibility
Public
Access
Open
Repository Files & Downloads
Benchmarks
| Model Format | Perplexity (PPL) | Size (GiB) | Bits per Weight (BPW) |
|---|---|---|---|
| BF16 (Unsloth Base) | 5.3035 | 221.64 | 16.00 |
| IQ5_K_R4 (ik_llama) | 5.3431 | 82 | 5.51 |
| UD_Q4_K_S (Unsloth) | 5.3883 | 69.30 | ~ 4.50 |
| IQ4_K_R4 (ik_llama) | 5.4242 | 67.2 | 4.31 |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"datasets": [
"ggml-org/ci"
],
"metrics": [
"perplexity"
],
"base_model": [
"mistralai/Mistral-Small-4-119B-2603"
],
"pipeline_tag": "text-generation",
"language": [
"ar",
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"ja",
"ko",
"zh"
],
"tags": [
"ik_llama.cpp",
"llama.cpp"
],
"frontmatter": {
"license": "apache-2.0",
"datasets": [
"ggml-org/ci"
],
"metrics": [
"perplexity"
],
"base_model": [
"mistralai/Mistral-Small-4-119B-2603"
],
"pipeline_tag": "text-generation",
"language": [
"ar",
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"ja",
"ko",
"zh"
],
"tags": [
"ik_llama.cpp",
"llama.cpp"
]
},
"hero_image_url": "https://huggingface.co/mistralai/Mistral-Small-4-119B-2603/resolve/main/images/image2.png",
"summary": "This repository provides ik_llama optimized GGUF quantizations for the mistralai/Mistral-Small-4-119B-2603 model.",
"quick_links": [],
"benchmark_table_html": "<table>\n <thead>\n <tr>\n <th>Model Format</th>\n <th>Perplexity (PPL)</th>\n <th>Size (GiB)</th>\n <th>Bits per Weight (BPW)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>BF16 (Unsloth Base)</td>\n <td>5.3035</td>\n <td>221.64</td>\n <td>16.00</td>\n </tr>\n <tr>\n <td>IQ5_K_R4 (ik_llama)</td>\n <td>5.3431</td>\n <td>82</td>\n <td>5.51</td>\n </tr>\n <tr>\n <td>UD_Q4_K_S (Unsloth)</td>\n <td>5.3883</td>\n <td>69.30</td>\n <td>~ 4.50</td>\n </tr>\n <tr>\n <td>IQ4_K_R4 (ik_llama)</td>\n <td>5.4242</td>\n <td>67.2</td>\n <td>4.31</td>\n </tr>\n </tbody>\n</table>",
"readme_markdown": "---\nlicense: apache-2.0\ndatasets:\n- ggml-org/ci\nmetrics:\n- perplexity\nbase_model:\n- mistralai/Mistral-Small-4-119B-2603\npipeline_tag: text-generation\nlanguage:\n- ar\n- en\n- fr\n- es\n- de\n- it\n- pt\n- nl\n- ja\n- ko\n- zh\ntags:\n- ik_llama.cpp\n- llama.cpp\n---\n\n# Mistral-Small-4-119B-2603 GGUF (ik_llama)\n\nThis repository provides [ik_llama](https://github.com/ikawrakow/ik_llama.cpp) optimized GGUF quantizations for the mistralai/Mistral-Small-4-119B-2603 model.\n\n## Optimization and Importance Matrix\nThese files were generated using the [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) fork, which offers row interleaved formats (R4) specifically designed to minimize degradation in MoE architectures. I send my biggest thanks to the author [ikawrakow](https://github.com/ikawrakow) for developing and maintaining this tool.\n\nTo maximize precision, an importance matrix (iMatrix) was computed prior to quantization. This calculation was performed on a RunPod infrastructure equipped with 3 NVIDIA A100 80GB accelerators, ingesting the calibration_data_v5_rc.txt dataset provided by [tristandruyen](https://gist.github.com/tristandruyen/9e207a95c7d75ddf37525d353e00659c).\n\n## Perplexity Scores\nAll tests were performed on the [`wikitext-2`](https://huggingface.co/datasets/ggml-org/ci) dataset using a context window of 1024 tokens and a batch size of 512. \n\n<table>\n <thead>\n <tr>\n <th>Model Format</th>\n <th>Perplexity (PPL)</th>\n <th>Size (GiB)</th>\n <th>Bits per Weight (BPW)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td>BF16 (Unsloth Base)</td>\n <td>5.3035</td>\n <td>221.64</td>\n <td>16.00</td>\n </tr>\n <tr>\n <td>IQ5_K_R4 (ik_llama)</td>\n <td>5.3431</td>\n <td>82</td>\n <td>5.51</td>\n </tr>\n <tr>\n <td>UD_Q4_K_S (Unsloth)</td>\n <td>5.3883</td>\n <td>69.30</td>\n <td>~ 4.50</td>\n </tr>\n <tr>\n <td>IQ4_K_R4 (ik_llama)</td>\n <td>5.4242</td>\n <td>67.2</td>\n <td>4.31</td>\n </tr>\n </tbody>\n</table>\n\n## Vision and Multimodality Note\nThis model features vision capabilities. You may find the original model's .mmproj file in the [\"Files and versions\"](https://huggingface.co/AlexanderKyng/Mistral-Small-119B-2603-ik-GGUF/tree/main) tab.\n\n### Credits\nOriginal model: [mistralai/Mistral-Small-4-119B-2603](https://huggingface.co/mistralai/Mistral-Small-4-119B-2603)\n\n\n\n\n# ================== Original Description ==================\n\nMistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.\n\nWith its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.\n\nTo further improve efficiency you can either take advantages of:\n- Speculative decoding thanks to our trained eagle head [`mistralai/Mistral-Small-4-119B-2603-eagle`](https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-eagle).\n- 4 bit float precision quantization thanks to our NVFP4 checkpoint [`mistralai/Mistral-Small-4-119B-2603-NVFP4`](https://huggingface.co/mistralai/Mistral-Small-4-119B-2603-NVFP4).\n\n## Key Features\n\nMistral Small 4 includes the following architectural choices:\n\n- **MoE**: 128 experts, 4 active.\n- **119B parameters**, with **6.5B activated per token**.\n- **256k context length**.\n- **Multimodal input**: Accepts both text and image input, with text output.\n- **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).\n\nMistral Small 4 offers the following capabilities:\n\n- **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.\n- **Vision**: Analyzes images and provides insights based on visual content, in addition to text.\n- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic.\n- **System Prompt**: Strong adherence and support for system prompts.\n- **Agentic**: Best-in-class agentic capabilities with native function calling and JSON output.\n- **Speed-Optimized**: Delivers best-in-class performance and speed.\n- **Apache 2.0 License**: Open-source license for both commercial and non-commercial use.\n- **Large Context Window**: Supports a 256k context window.\n\n## Recommended Settings\n\n- **Reasoning Effort**:\n - `'none'` → Do not use reasoning\n - `'high'` → Use reasoning (recommended for complex prompts)\n Use `reasoning_effort=\"high\"` for complex tasks\n- **Temperature**: 0.7 for `reasoning_effort=\"high\"`. Temp between 0.0 and 0.7 for `reasoning_effort=\"none\"` depending on task.\n\n## Use Cases\n\nMistral Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.\n\nIts capabilities are ideal for:\n- Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.\n- Enterprises seeking general chat assistants, agents, and document understanding.\n- Researchers leveraging its math and research capabilities.\n\nMistral Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.\n\n### Examples\n- General chat assistant\n- Document parsing and extraction\n- Coding agent\n- Research assistant\n- Customization & fine-tuning\n- And more...\n\n## Benchmarks\n\n### Comparison with internal models\n\nDepending on your tasks you can trigger reasoning thanks to the support of the **per-request** parameter `reasoning_effort`. Set it to:\n- `reasoning_effort=\"none\"`: Fast, lightweight responses for everyday tasks, equivalent to the same chat style of [`mistralai/Mistral-Small-3.2-24B-Instruct-2506`](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506).\n- `reasoning_effort=\"high\"`: Deep, step-by-step reasoning for complex problems, with equivalent verbosity to previous Magistral models such as [`mistralai/Magistral-Small-2509`](https://huggingface.co/mistralai/Magistral-Small-2509).\n\n\n\n#### Comparing Reasoning Models\n\n\n\n\n### Comparison with other models\n\nMistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly\nshorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)\nfor comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.\nThis efficiency reduces latency, inference costs, and improves user experience.\n\n\n\n\n\n## Usage\n\nYou can find Mistral Small 4 support on multiple libraries for inference and fine-tuning. We here thank everyone contributors and maintainers that helped us making it happen.\n\n## License\n\nThis model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).\n\n*You must not use this model in a manner that infringes, misappropriates, or violates any third party’s rights, including intellectual property rights.*\n",
"related_quantizations": []
},
"tags": [
"gguf",
"ik_llama.cpp",
"llama.cpp",
"text-generation",
"ar",
"en",
"fr",
"es",
"de",
"it",
"pt",
"nl",
"ja",
"ko",
"zh",
"dataset:ggml-org/ci",
"base_model:mistralai/Mistral-Small-4-119B-2603",
"base_model:quantized:mistralai/Mistral-Small-4-119B-2603",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"imatrix",
"conversational"
],
"likes": 0,
"downloads": 1599,
"gated": false,
"private": false,
"last_modified": "2026-04-11T16:42:11.000Z",
"created_at": "2026-04-11T14:28:13.000Z",
"pipeline_tag": "text-generation",
"library_name": ""
}
Source payload excerpt (from Hugging Face API)
{
"_id": "69da5a7d335901ef3e00e864",
"id": "AlexanderKyng/Mistral-Small-119B-2603-ik-GGUF",
"modelId": "AlexanderKyng/Mistral-Small-119B-2603-ik-GGUF",
"sha": "84b068926406196158f6b6a19ea6dcbb2807b4af",
"createdAt": "2026-04-11T14:28:13.000Z",
"lastModified": "2026-04-11T16:42:11.000Z",
"author": "AlexanderKyng",
"downloads": 1599,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "text-generation",
"library_name": "",
"siblings_count": 6
}