Model Intelligence Sheet

prithivmlmods/gelato-30b-a3b-f32-aio-gguf overview

Gelato-30B-A3B is a 30B-parameter Qwen3-VL MoE–based grounding model specialized for GUI computer-use tasks, trained on the Click-100k dataset to map natural language instructions and screen images to precise click coordinates on user interfaces. It achieves state-of-the-art accuracy on key grounding benchmarks, reaching about 63.88%63.88% on ScreenSpot-Pro and 69.15%/74.65%69.15%/74.65% on OS-World-G / OS-World-G (Refined), outperforming prior dedicated computer grounding models such as GTA1-32B and even larger general-purpose VLMs like Qwen3-VL-235B-A22B-Instruct. The model is released with an open codebase and examples showing how to feed a GUI screenshot plus an instruction and obtain normalized (x,y)(x,y) coordinates, making it a strong drop-in component for building computer-use agents that can reliably locate UI elements and interact with real software environments.

transformersggufqwen3_vl_moellama.cpptext-generation-inferenceagentimage-text-to-textenbase_model:mlfoundations/Gelato-30B-A3Bbase_model:quantized:mlfoundations/Gelato-30B-A3Blicense:apache-2.0endpoints_compatibleregion:usconversational

prithivmlmods/gelato-30b-a3b-f32-aio-gguf visual

Downloads

Likes

Pipeline

image-text-to-text

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

42 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Gelato-30B-A3B-BF16.gguf	GGUF	BF16	56.90 GB	Download
Gelato-30B-A3B-F16.gguf	GGUF	F16	56.90 GB	Download
Gelato-30B-A3B-F32.gguf	GGUF	F32	113.75 GB	Download
Gelato-30B-A3B-mmproj-bf16.gguf	GGUF	BF16	1.01 GB	Download
Gelato-30B-A3B-mmproj-f16.gguf	GGUF	F16	1.01 GB	Download
Gelato-30B-A3B-mmproj-f32.gguf	GGUF	F32	2.01 GB	Download
Gelato-30B-A3B-mmproj-q8_0.gguf	GGUF	—	679.16 MB	Download
Gelato-30B-A3B.IQ4_XS.gguf	GGUF	IQ4_XS	15.42 GB	Download
Gelato-30B-A3B.Q2_K.gguf	GGUF	Q2_K	10.49 GB	Download
Gelato-30B-A3B.Q3_K_L.gguf	GGUF	Q3_K_L	14.81 GB	Download
Gelato-30B-A3B.Q3_K_M.gguf	GGUF	Q3_K_M	13.70 GB	Download
Gelato-30B-A3B.Q3_K_S.gguf	GGUF	Q3_K_S	12.38 GB	Download
Gelato-30B-A3B.Q4_K_M.gguf	GGUF	Q4_K_M	17.28 GB	Download
Gelato-30B-A3B.Q4_K_S.gguf	GGUF	Q4_K_S	16.26 GB	Download
Gelato-30B-A3B.Q5_K_M.gguf	GGUF	Q5_K_M	20.23 GB	Download
Gelato-30B-A3B.Q5_K_S.gguf	GGUF	Q5_K_S	19.63 GB	Download
Gelato-30B-A3B.Q6_K.gguf	GGUF	Q6_K	23.37 GB	Download
Gelato-30B-A3B.Q8_0.gguf	GGUF	—	30.25 GB	Download
Gelato-30B-A3B.i1-IQ1_M.gguf	GGUF	IQ1_M	6.59 GB	Download
Gelato-30B-A3B.i1-IQ1_S.gguf	GGUF	IQ1_S	5.98 GB	Download
Gelato-30B-A3B.i1-IQ2_M.gguf	GGUF	IQ2_M	9.47 GB	Download
Gelato-30B-A3B.i1-IQ2_S.gguf	GGUF	IQ2_S	8.65 GB	Download
Gelato-30B-A3B.i1-IQ2_XS.gguf	GGUF	IQ2_XS	8.45 GB	Download
Gelato-30B-A3B.i1-IQ2_XXS.gguf	GGUF	IQ2_XXS	7.62 GB	Download
Gelato-30B-A3B.i1-IQ3_M.gguf	GGUF	IQ3_M	12.59 GB	Download
Gelato-30B-A3B.i1-IQ3_S.gguf	GGUF	IQ3_S	12.39 GB	Download
Gelato-30B-A3B.i1-IQ3_XS.gguf	GGUF	IQ3_XS	11.73 GB	Download
Gelato-30B-A3B.i1-IQ3_XXS.gguf	GGUF	IQ3_XXS	11.04 GB	Download
Gelato-30B-A3B.i1-IQ4_XS.gguf	GGUF	IQ4_XS	15.24 GB	Download
Gelato-30B-A3B.i1-Q2_K.gguf	GGUF	Q2_K	10.49 GB	Download
Gelato-30B-A3B.i1-Q2_K_S.gguf	GGUF	Q2_K_S	9.80 GB	Download
Gelato-30B-A3B.i1-Q3_K_L.gguf	GGUF	Q3_K_L	14.81 GB	Download
Gelato-30B-A3B.i1-Q3_K_M.gguf	GGUF	Q3_K_M	13.70 GB	Download
Gelato-30B-A3B.i1-Q3_K_S.gguf	GGUF	Q3_K_S	12.38 GB	Download
Gelato-30B-A3B.i1-Q4_0.gguf	GGUF	—	16.19 GB	Download
Gelato-30B-A3B.i1-Q4_1.gguf	GGUF	—	17.87 GB	Download
Gelato-30B-A3B.i1-Q4_K_M.gguf	GGUF	Q4_K_M	17.28 GB	Download
Gelato-30B-A3B.i1-Q4_K_S.gguf	GGUF	Q4_K_S	16.26 GB	Download
Gelato-30B-A3B.i1-Q5_K_M.gguf	GGUF	Q5_K_M	20.23 GB	Download
Gelato-30B-A3B.i1-Q5_K_S.gguf	GGUF	Q5_K_S	19.63 GB	Download
Gelato-30B-A3B.i1-Q6_K.gguf	GGUF	Q6_K	23.37 GB	Download
Gelato-30B-A3B.imatrix.gguf	GGUF	—	116.38 MB	Download

Model Details Live

Model Slug

prithivmlmods/gelato-30b-a3b-f32-aio-gguf

Author

prithivMLmods

Pipeline Task

image-text-to-text

Library

transformers

Created

2025-11-16

Last Modified

2025-11-16

Gated

Private

HF SHA

736357bb1aa41995e60297ef3dde8dabaf8643cd

License

apache-2.0

Language

Base Model

mlfoundations/Gelato-30B-A3B

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "language": [
      "en"
    ],
    "base_model": [
      "mlfoundations/Gelato-30B-A3B"
    ],
    "pipeline_tag": "image-text-to-text",
    "library_name": "transformers",
    "tags": [
      "llama.cpp",
      "text-generation-inference",
      "agent"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "language": [
        "en"
      ],
      "base_model": [
        "mlfoundations/Gelato-30B-A3B"
      ],
      "pipeline_tag": "image-text-to-text",
      "library_name": "transformers",
      "tags": [
        "llama.cpp",
        "text-generation-inference",
        "agent"
      ]
    },
    "hero_image_url": "https://www.nethype.de/huggingface_embed/quantpplgraph.png",
    "summary": "> Gelato-30B-A3B is a 30B-parameter Qwen3-VL MoE–based grounding model specialized for GUI computer-use tasks, trained on the Click-100k dataset to map natural language instructions and screen images to precise click coordinates on user interfaces. It achieves state-of-the-art accuracy on key grounding benchmarks, reaching about 63.88%63.88% on ScreenSpot-Pro and 69.15%/74.65%69.15%/74.65% on OS-World-G / OS-World-G (Refined), outperforming prior dedicated computer grounding models such as GTA1-32B and even larger general-purpose VLMs like Qwen3-VL-235B-A22B-Instruct. The model is released with an open codebase and examples showing how to feed a GUI screenshot plus an instruction and obtain normalized (x,y)(x,y) coordinates, making it a strong drop-in component for building computer-use agents that can reliably locate UI elements and interact with real software environments.",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nlanguage:\n- en\nbase_model:\n- mlfoundations/Gelato-30B-A3B\npipeline_tag: image-text-to-text\nlibrary_name: transformers\ntags:\n- llama.cpp\n- text-generation-inference\n- agent\n---\n\n# **Gelato-30B-A3B-f32-AIO-GGUF**\n\n> [Gelato-30B-A3B](https://huggingface.co/mlfoundations/Gelato-30B-A3B) is a 30B-parameter Qwen3-VL MoE–based grounding model specialized for GUI computer-use tasks, trained on the Click-100k dataset to map natural language instructions and screen images to precise click coordinates on user interfaces. It achieves state-of-the-art accuracy on key grounding benchmarks, reaching about 63.88%63.88% on ScreenSpot-Pro and 69.15%/74.65%69.15%/74.65% on OS-World-G / OS-World-G (Refined), outperforming prior dedicated computer grounding models such as GTA1-32B and even larger general-purpose VLMs like Qwen3-VL-235B-A22B-Instruct. The model is released with an open codebase and examples showing how to feed a GUI screenshot plus an instruction and obtain normalized (x,y)(x,y) coordinates, making it a strong drop-in component for building computer-use agents that can reliably locate UI elements and interact with real software environments.\n\n## Model Files\n\n| File Name | Quant Type | File Size |\n| - | - | - |\n| Gelato-30B-A3B-BF16.gguf | BF16 | 61.1 GB |\n| Gelato-30B-A3B-F16.gguf | F16 | 61.1 GB |\n| Gelato-30B-A3B-F32.gguf | F32 | 122 GB |\n| Gelato-30B-A3B.IQ4_XS.gguf | IQ4_XS | 16.6 GB |\n| Gelato-30B-A3B.Q2_K.gguf | Q2_K | 11.3 GB |\n| Gelato-30B-A3B.Q3_K_L.gguf | Q3_K_L | 15.9 GB |\n| Gelato-30B-A3B.Q3_K_M.gguf | Q3_K_M | 14.7 GB |\n| Gelato-30B-A3B.Q3_K_S.gguf | Q3_K_S | 13.3 GB |\n| Gelato-30B-A3B.Q4_K_M.gguf | Q4_K_M | 18.6 GB |\n| Gelato-30B-A3B.Q4_K_S.gguf | Q4_K_S | 17.5 GB |\n| Gelato-30B-A3B.Q5_K_M.gguf | Q5_K_M | 21.7 GB |\n| Gelato-30B-A3B.Q5_K_S.gguf | Q5_K_S | 21.1 GB |\n| Gelato-30B-A3B.Q6_K.gguf | Q6_K | 25.1 GB |\n| Gelato-30B-A3B.Q8_0.gguf | Q8_0 | 32.5 GB |\n| Gelato-30B-A3B.i1-IQ1_M.gguf | i1-IQ1_M | 7.08 GB |\n| Gelato-30B-A3B.i1-IQ1_S.gguf | i1-IQ1_S | 6.42 GB |\n| Gelato-30B-A3B.i1-IQ2_M.gguf | i1-IQ2_M | 10.2 GB |\n| Gelato-30B-A3B.i1-IQ2_S.gguf | i1-IQ2_S | 9.29 GB |\n| Gelato-30B-A3B.i1-IQ2_XS.gguf | i1-IQ2_XS | 9.08 GB |\n| Gelato-30B-A3B.i1-IQ2_XXS.gguf | i1-IQ2_XXS | 8.18 GB |\n| Gelato-30B-A3B.i1-IQ3_M.gguf | i1-IQ3_M | 13.5 GB |\n| Gelato-30B-A3B.i1-IQ3_S.gguf | i1-IQ3_S | 13.3 GB |\n| Gelato-30B-A3B.i1-IQ3_XS.gguf | i1-IQ3_XS | 12.6 GB |\n| Gelato-30B-A3B.i1-IQ3_XXS.gguf | i1-IQ3_XXS | 11.8 GB |\n| Gelato-30B-A3B.i1-IQ4_XS.gguf | i1-IQ4_XS | 16.4 GB |\n| Gelato-30B-A3B.i1-Q2_K.gguf | i1-Q2_K | 11.3 GB |\n| Gelato-30B-A3B.i1-Q2_K_S.gguf | i1-Q2_K_S | 10.5 GB |\n| Gelato-30B-A3B.i1-Q3_K_L.gguf | i1-Q3_K_L | 15.9 GB |\n| Gelato-30B-A3B.i1-Q3_K_M.gguf | i1-Q3_K_M | 14.7 GB |\n| Gelato-30B-A3B.i1-Q3_K_S.gguf | i1-Q3_K_S | 13.3 GB |\n| Gelato-30B-A3B.i1-Q4_0.gguf | i1-Q4_0 | 17.4 GB |\n| Gelato-30B-A3B.i1-Q4_1.gguf | i1-Q4_1 | 19.2 GB |\n| Gelato-30B-A3B.i1-Q4_K_M.gguf | i1-Q4_K_M | 18.6 GB |\n| Gelato-30B-A3B.i1-Q4_K_S.gguf | i1-Q4_K_S | 17.5 GB |\n| Gelato-30B-A3B.i1-Q5_K_M.gguf | i1-Q5_K_M | 21.7 GB |\n| Gelato-30B-A3B.i1-Q5_K_S.gguf | i1-Q5_K_S | 21.1 GB |\n| Gelato-30B-A3B.i1-Q6_K.gguf | i1-Q6_K | 25.1 GB |\n| Gelato-30B-A3B-mmproj-bf16.gguf | mmproj-bf16 | 1.09 GB |\n| Gelato-30B-A3B-mmproj-f16.gguf | mmproj-f16 | 1.08 GB |\n| Gelato-30B-A3B-mmproj-f32.gguf | mmproj-f32 | 2.15 GB |\n| Gelato-30B-A3B-mmproj-q8_0.gguf | mmproj-q8_0 | 712 MB |\n| Gelato-30B-A3B.imatrix.gguf | imatrix | 122 MB |\n\n## Quants Usage \n\n(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)\n\nHere is a handy graph by ikawrakow comparing some lower-quality quant\ntypes (lower is better):\n\n![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "qwen3_vl_moe",
    "llama.cpp",
    "text-generation-inference",
    "agent",
    "image-text-to-text",
    "en",
    "base_model:mlfoundations/Gelato-30B-A3B",
    "base_model:quantized:mlfoundations/Gelato-30B-A3B",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 2,
  "downloads": 93,
  "gated": false,
  "private": false,
  "last_modified": "2025-11-16T10:26:29.000Z",
  "created_at": "2025-11-16T08:53:51.000Z",
  "pipeline_tag": "image-text-to-text",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6919911f118b0217fc1b0cec",
  "id": "prithivMLmods/Gelato-30B-A3B-f32-AIO-GGUF",
  "modelId": "prithivMLmods/Gelato-30B-A3B-f32-AIO-GGUF",
  "sha": "736357bb1aa41995e60297ef3dde8dabaf8643cd",
  "createdAt": "2025-11-16T08:53:51.000Z",
  "lastModified": "2025-11-16T10:26:29.000Z",
  "author": "prithivMLmods",
  "downloads": 93,
  "likes": 2,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-text-to-text",
  "library_name": "transformers",
  "siblings_count": 45
}