Model Comparison

ggml-org/smolvlm2-500m-video-instruct-ggufvsunsloth/qwen3-vl-4b-instruct-gguf

Side-by-side comparison of ggml-org/smolvlm2-500m-video-instruct-gguf and unsloth/qwen3-vl-4b-instruct-gguf: downloads, license, context length, tasks, and benchmarks.

ggml-org/smolvlm2-500m-video-instruct-gguf

ggml-org · —

Original model: https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct For more info, please refer to this PR: https://github.com/ggml-org/llama.cpp/pull/13050

unsloth/qwen3-vl-4b-instruct-gguf

unsloth · image-text-to-text

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and vid…

Side-by-side Specifications

	ggml-org/smolvlm2-500m-video-instruct-gguf	unsloth/qwen3-vl-4b-instruct-gguf
Author	ggml-org	unsloth
Pipeline Task	—	image-text-to-text
Library	—	transformers
Downloads	24,741	82,451
Likes	17	46
License	Unknown	Unknown
Context Length	—	—
Created	2025-04-21	2025-10-30
Last Modified	2025-04-30	2025-10-31
Tags	ggufbase_model:HuggingFaceTB/SmolVLM2-500M-Video-Instructbase_model:quantized:HuggingFaceTB/SmolVLM2-500M-Video-Instructlicense:apache-2.0endpoints_compatibleregion:usconversational	transformersggufunslothqwenqwen3image-text-to-textarxiv:2505.09388arxiv:2502.13923arxiv:2409.12191arxiv:2308.12966

View full details: ggml-org/smolvlm2-500m-video-instruct-gguf · unsloth/qwen3-vl-4b-instruct-gguf