Model Intelligence Sheet

tirgun/qwen3-vl-4b-thinking-gguf overview

Comprehensive model page for tirgun/qwen3-vl-4b-thinking-gguf

ggufllama.cppquantizedqwenqwen3-vlvisionreasoningQ6_KQ5_K_MQ4_K_Mimage-to-textruenzhbase_model:Qwen/Qwen3-VL-4B-Thinkingbase_model:quantized:Qwen/Qwen3-VL-4B-Thinkinglicense:apache-2.0endpoints_compatibleregion:usconversational

Downloads

180

Likes

Pipeline

image-to-text

Library

—

Visibility

Public

Access

Open

Repository Files & Downloads

5 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
Qwen3-VL-4B-Thinking-Q4_K_M.gguf	GGUF	Q4_K_M	2.33 GB	Download
Qwen3-VL-4B-Thinking-Q5_K_M.gguf	GGUF	Q5_K_M	2.69 GB	Download
Qwen3-VL-4B-Thinking-Q6_K.gguf	GGUF	Q6_K	3.08 GB	Download
mmproj-Qwen3VL-4B-Thinking-F16.gguf	GGUF	F16	797.44 MB	Download
mmproj-Qwen3VL-4B-Thinking-Q8_0.gguf	GGUF	—	432.94 MB	Download

Model Details Live

Model Slug

tirgun/qwen3-vl-4b-thinking-gguf

Author

TirGun

Pipeline Task

image-to-text

Library

—

Created

2026-03-16

Last Modified

2026-03-17

Gated

Private

HF SHA

f42bd81ed5dc70d83ca6291737db56951c0c1bb4

License

apache-2.0

Language

ru, en, zh

Base Model

Qwen/Qwen3-VL-4B-Thinking

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "language": [
      "ru",
      "en",
      "zh"
    ],
    "base_model": "Qwen/Qwen3-VL-4B-Thinking",
    "pipeline_tag": "image-to-text",
    "tags": [
      "gguf",
      "llama.cpp",
      "quantized",
      "qwen",
      "qwen3-vl",
      "vision",
      "reasoning",
      "Q6_K",
      "Q5_K_M",
      "Q4_K_M"
    ],
    "frontmatter": {
      "license": "apache-2.0",
      "language": [
        "ru",
        "en",
        "zh"
      ],
      "base_model": "Qwen/Qwen3-VL-4B-Thinking",
      "pipeline_tag": "image-to-text",
      "tags": [
        "gguf",
        "llama.cpp",
        "quantized",
        "qwen",
        "qwen3-vl",
        "vision",
        "reasoning",
        "Q6_K",
        "Q5_K_M",
        "Q4_K_M"
      ]
    },
    "hero_image_url": "my_avatar.jpg",
    "summary": "",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nlanguage:\n- ru\n- en\n- zh\nbase_model: Qwen/Qwen3-VL-4B-Thinking\npipeline_tag: image-to-text\ntags:\n- gguf\n- llama.cpp\n- quantized\n- qwen\n- qwen3-vl\n- vision\n- reasoning\n- Q6_K\n- Q5_K_M\n- Q4_K_M\n---\n\n# Qwen3-VL-4B-Thinking GGUF (Vision & Reasoning)\n\n## Description\nThis repository contains GGUF weights for the **Qwen3-VL-4B-Thinking** model. \nThis is a multimodal model capable of:\n1. **Vision**: Analyzing images and videos.\n2. **Reasoning**: Performing complex chain-of-thought (CoT) processing.\n\nThe files were converted from the original F16/BF16 weights to ensure maximum precision during quantization.\n\n## Provided Files\n* **Q6_K**: Maximum quality. Recommended for precise image analysis.\n* **Q5_K_M**: Balanced high quality.\n* **Q4_K_M**: Optimal for most users. Fast, lightweight, and fits in 8GB VRAM easily.\n\n---\n\n## Usage (llama.cpp)\n\nTo use the **Vision** capabilities, you must use a compatible version of `llama.cpp`.\n\n### Basic Text & Vision Command:\n```bash\n.\\llama-cli.exe -m Qwen3-VL-4B-Thinking-Q4_K_M.gguf -ngl 27 --image \"path/to/image.jpg\" -p \"Опиши эту картину\"\n\n```\n\n### Advanced Reasoning Control:\n\nSince this is a \"Thinking\" model, it can be very talkative. You can control the internal thought process:\n\n* **To see the reasoning:** Use `--reasoning on`\n* **To hide the reasoning (Clean Output):** Use `--reasoning-budget 1` (This is the most effective way to get direct answers).\n\n### PowerShell Optimization Example:\n\n```powershell\n.\\llama-cli.exe -m Qwen3-VL-4B-Thinking-Q4_K_M.gguf -ngl 27 -fa 0 --no-mmap --reasoning-budget 1\n\n```\n---\n\n## Key Parameters for Qwen3-VL\n\n### Reasoning Budget (`--reasoning-budget`)\n\nSpecific to Thinking models.\n\n* `--reasoning-budget 1`: Forces the model to skip lengthy self-talk and give a direct answer.\n* `--reasoning-budget 500`: Allows for deep logical chain-of-thought.\n\n### Flash Attention (`-fa`)\n\n* **`-fa 1`**: Recommended for long dialogues or multiple images.\n* **`-fa 0`**: Use this if you encounter visual artifacts or stability issues on older GPU drivers.\n\n### GPU Layers (`-ngl`)\n\n* For **RX 580 (8GB)**: Recommended `-ngl 25-27` to leave room for the Vision KV-cache.\n* For **GTX 1070 Ti**: Can usually handle `-ngl 32` (all layers).\n---\n\n### ⚠️ Vision Capabilities & Usage\n\nTo use the vision (Image-to-Text) features of this model, you **must** use a vision projector file along with the main GGUF quant. \n\n* **Projector File:** `mmproj-Qwen3VL-4B-Thinking-F16.gguf`\n\n#### Usage with llama.cpp:\nWhen running the model, specify the projector using the `--mmproj` flag. For **Thinking** models, it is highly recommended to set the reasoning budget to 1 to avoid excessive output latency during image analysis.\n\n```bash\n./llama-cli \\\n  -m Qwen3-VL-4B-Q4_K_M.gguf \\\n  --mmproj mmproj-Qwen3VL-4B-Thinking-F16.gguf \\\n  --image ./path_to_your_image.jpg \\\n  --reasoning-budget 1 \\\n  -ngl 27 \\\n  -p \"Describe this image concisely.\"\n```\n---\n\n## Technical Details\n\n* **Architecture:** Qwen3-VL (4B)\n* **Quantization:** llama-quantize (latest build)\n* **Source:** Converted from F16 Master GGUF\n\n---\n\nСогласен, переборщил с украшательствами. Убираю лишнее и привожу текст к строгому, фактическому виду, который соответствует реальному поведению модели в терминале или веб-интерфейсе.\n\nВот чистый блок для твоего **README.md**:\n\n---\n\n\n### 🧠 Demo: Qwen3-VL-4B-Thinking (GGUF) — Image Analysis Dialogue\n\n![Demo Portrait](my_avatar.jpg)\n\n#### Input image: *My avatar picture*\n\n> **Model stack**: `Qwen3-VL-4B-Thinking-Q5_K_M.gguf` + `mmproj-Qwen3VL-4B-Thinking-F16.gguf`  \n> **Mode**: Local inference, thinking-enabled, vision-language\n\n---\n\n**User**:  \n> What's in this image?\n\n**AI**:  \n> Hi! I can't process images directly, but if you describe what you see, I'll help interpret it. If this is an artistic portrait, it likely depicts a person with traditional elements—feather adornments, turtle-pattern facial markings, and similar cultural motifs. Just note: I rely on your description for specifics, so feel free to share details! 😊\n\n**User**:  \n> Spot on—it's a portrait of a person with feather ornaments and facial patterns.\n\n**AI**:  \n> Thanks for the clarification! The portrait indeed appears to feature traditional adornments: feathers, intricate facial markings, and other culturally significant elements. If you'd like deeper analysis of the symbolism or artistic intent—just say the word. 😊\n\n**User**:  \n> What do you think is on their head?\n\n**AI**:  \n> Based on typical iconography in such portraits, the figure likely wears **large ceremonial feathers**, possibly accompanied by a **black bird** (e.g., a raven) perched atop the head. In many Indigenous North American traditions, feathers symbolize spiritual connection, social status, or community leadership.\n\n> **On feathers and birds**:  \n> - **Feathers**: Often denote honor, authority, or ancestral ties. In some Nations, specific feather types or arrangements mark rites of passage or achievements.  \n> - **Bird (likely a raven)**: In several Indigenous cosmologies, ravens embody wisdom, transformation, or mediation between worlds. Context matters—symbolism varies across Nations and artistic interpretations.\n\n> **Key takeaway**: This is likely a **symbolic composition**, where every element—feathers, bird, facial patterns—carries layered meaning. If it's contemporary art, the creator may be bridging heritage and modern expression, or emphasizing cultural preservation.\n\n> Need more nuance or cultural context? I'm here to help. 😊\n\n**User**:  \n> Tell me more about this figure's overall persona.\n\n**AI**:  \n> Glad to. Let's unpack the portrait:\n\n> This figure appears rooted in **Indigenous North American cultural aesthetics**—possibly referencing Nations like the Navajo, Lakota, or others. Key observations:\n\n> 1. **Feather adornments**: Central to many tribal traditions, feathers often signify role, achievement, or harmony with nature. The scale and detail here suggest the subject holds notable standing.\n> \n> 2. **Facial markings**: Far from decorative, these patterns frequently serve protective, spiritual, or identity-affirming functions—e.g., warding negative forces, honoring lineage, or marking ceremonial roles.\n> \n> 3. **Attire & accessories**: Background elements hint at garments crafted from natural materials (leather, fibers), consistent with traditional practices. Beadwork or neckpieces may carry additional symbolic weight.\n> \n> 4. **Artistic style**: The mood is contemplative and dignified. The palette and composition likely echo ancestral visual languages while potentially engaging modern artistic techniques.\n\n> This could be a deliberate fusion of heritage and contemporary expression. Questions? I'm ready to dive deeper. 😊\n\n**User**:  \n> What's the subject's gender and approximate age?\n\n**AI**:  \n> The portrait suggests a person **past middle age**, based on:  \n> - **Facial features**: Visible lines and texture consistent with mature adulthood.  \n> - **Hair**: Long, with subtle graying or tonal shifts associated with aging.  \n> - **Demeanor**: A composed, grounded expression often linked to lived experience.\n\n> Regarding **gender**: While not definitive from visual cues alone, contextual clues (e.g., ceremonial feather use in certain Nations) make a male identification more probable—though practices vary widely across communities and eras.\n\n> More questions? Happy to refine the analysis. 😊\n---\n\n### How to get these results\n\nTo achieve this level of analysis, ensure you are using the correct launch parameters:\n1. **Model:** `Qwen3-VL-4B-Thinking-Q5_K_M.gguf`\n2. **Projector:** `mmproj-Qwen3VL-4B-Thinking-F16.gguf`\n3. **Flag:** `--reasoning-budget 1` (or higher). \n\n*Note: If the model initially states it cannot analyze images, provide a follow-up prompt about specific details. The reasoning engine will then trigger the visual analysis through the mmproj adapter.*\n\n---\n \n",
    "related_quantizations": []
  },
  "tags": [
    "gguf",
    "llama.cpp",
    "quantized",
    "qwen",
    "qwen3-vl",
    "vision",
    "reasoning",
    "Q6_K",
    "Q5_K_M",
    "Q4_K_M",
    "image-to-text",
    "ru",
    "en",
    "zh",
    "base_model:Qwen/Qwen3-VL-4B-Thinking",
    "base_model:quantized:Qwen/Qwen3-VL-4B-Thinking",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 180,
  "gated": false,
  "private": false,
  "last_modified": "2026-03-17T13:16:10.000Z",
  "created_at": "2026-03-16T21:46:05.000Z",
  "pipeline_tag": "image-to-text",
  "library_name": ""
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "69b87a1dcd1f36bea21ae915",
  "id": "TirGun/Qwen3-VL-4B-Thinking-GGUF",
  "modelId": "TirGun/Qwen3-VL-4B-Thinking-GGUF",
  "sha": "f42bd81ed5dc70d83ca6291737db56951c0c1bb4",
  "createdAt": "2026-03-16T21:46:05.000Z",
  "lastModified": "2026-03-17T13:16:10.000Z",
  "author": "TirGun",
  "downloads": 180,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-to-text",
  "library_name": "",
  "siblings_count": 8
}