richarderkhov/mikewang_-_pvd-160k-mistral-7b-gguf overview
Quantization made by Richard Erkhov. This repository contains a quantized version of the model presented in Visually Descriptive Language Model for Vector Graphics Reasoning. Github Discord Request more models Project page: https://mikewangwzhl.github.io/VDLM/ Code: https://github.com/MikeWangWZHL/VDLM PVD-160k-Mistral-7b - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | PVD-160k-Mistral-7b.Q2K.gguf | Q2K | 2.53GB | | PVD-160k-Mistral-7b.IQ3XS.gguf | IQ3XS | 2.81GB | | PVD-160k-Mistral-7b.IQ3S.gguf | IQ3S | 2.96GB | | PVD-160k-Mistral-7b.Q3KS.gguf | Q3KS | 2.95GB | | PVD-160k-Mistral-7b.IQ3M.gguf | IQ3M | 3.06GB | | PVD-160k-Mistral-7b.Q3K.gguf | Q3K | 3.28GB | | PVD-160k-Mistral-7b.Q3KM.gguf | Q3KM | 3.28GB | | PVD-160k-Mistral-7b.Q3KL.gguf | Q3KL | 3.56GB | | PVD-160k-Mistral-7b.IQ4XS.gguf | IQ4XS | 3.67GB | | PVD-160k-Mistral-7b.Q40.gguf | Q40 | 3.83GB | | PVD-160k-Mistral-7b.IQ4NL.gguf | IQ4NL | 3.87GB | | PVD-160k-Mistral-7b.Q4KS.gguf | Q4KS | 3.86GB | | PVD-160k-Mistral-7b.Q4K.gguf | Q4K | 4.07GB | | PVD-160k-Mistral-7b.Q4KM.gguf | Q4KM | 4.07GB | | PVD-160k-Mistral-7b.Q41.gguf | Q41 | 4.24GB | | PVD-160k-Mistral-7b.Q50.gguf | Q50 | 4.65GB | | PVD-160k-Mistral-7b.Q5KS.gguf | Q5KS | 4.65GB | | PVD-160k-Mistral-7b.Q5K.gguf | Q5K | 4.78GB | | PVD-160k-Mistral-7b.Q5KM.gguf | Q5KM | 1.7GB | | PVD-160k-Mistral-7b.Q51.gguf | Q51 | 5.07GB | | PVD-160k-Mistral-7b.Q6K.gguf | Q6K | 5.53GB | | PVD-160k-Mistral-7b.Q80.gguf | Q80 | 7.17GB | Original model description: --- license: apache-2.0 datasets: --- Text-Based Reasoning About Vector Graphics 🌐 Homepage • 📃 Paper • 🤗 Data (PVD-160k) • 🤗 Model (PVD-160k-Mistral-7b) • 💻 Code We observe that current large multimodal models (LMMs) still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes. !Teaser To solve this challenge, we propose Visually Descriptive Language Model (VDLM), a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our paper for more details. !Overview
Repository Files & Downloads
| File | Type | Quantization | Size | Link |
|---|---|---|---|---|
| PVD-160k-Mistral-7b.IQ3_M.gguf | GGUF | IQ3_M | 3.06 GB | Download |
| PVD-160k-Mistral-7b.IQ3_S.gguf | GGUF | IQ3_S | 2.96 GB | Download |
| PVD-160k-Mistral-7b.IQ3_XS.gguf | GGUF | IQ3_XS | 2.81 GB | Download |
| PVD-160k-Mistral-7b.IQ4_NL.gguf | GGUF | IQ4_NL | 3.87 GB | Download |
| PVD-160k-Mistral-7b.IQ4_XS.gguf | GGUF | IQ4_XS | 3.67 GB | Download |
| PVD-160k-Mistral-7b.Q2_K.gguf | GGUF | Q2_K | 2.53 GB | Download |
| PVD-160k-Mistral-7b.Q3_K.gguf | GGUF | Q3_K | 3.28 GB | Download |
| PVD-160k-Mistral-7b.Q3_K_L.gguf | GGUF | Q3_K_L | 3.56 GB | Download |
| PVD-160k-Mistral-7b.Q3_K_M.gguf | GGUF | Q3_K_M | 3.28 GB | Download |
| PVD-160k-Mistral-7b.Q3_K_S.gguf | GGUF | Q3_K_S | 2.95 GB | Download |
| PVD-160k-Mistral-7b.Q4_0.gguf | GGUF | — | 3.83 GB | Download |
| PVD-160k-Mistral-7b.Q4_1.gguf | GGUF | — | 4.24 GB | Download |
| PVD-160k-Mistral-7b.Q4_K.gguf | GGUF | Q4_K | 4.07 GB | Download |
| PVD-160k-Mistral-7b.Q4_K_M.gguf | GGUF | Q4_K_M | 4.07 GB | Download |
| PVD-160k-Mistral-7b.Q4_K_S.gguf | GGUF | Q4_K_S | 3.86 GB | Download |
| PVD-160k-Mistral-7b.Q5_0.gguf | GGUF | — | 4.65 GB | Download |
| PVD-160k-Mistral-7b.Q5_1.gguf | GGUF | — | 5.07 GB | Download |
| PVD-160k-Mistral-7b.Q5_K.gguf | GGUF | Q5_K | 4.78 GB | Download |
| PVD-160k-Mistral-7b.Q5_K_M.gguf | GGUF | Q5_K_M | 1.70 GB | Download |
| PVD-160k-Mistral-7b.Q5_K_S.gguf | GGUF | Q5_K_S | 4.65 GB | Download |
| PVD-160k-Mistral-7b.Q6_K.gguf | GGUF | Q6_K | 5.53 GB | Download |
| PVD-160k-Mistral-7b.Q8_0.gguf | GGUF | — | 7.17 GB | Download |
Model Details Live
Metadata Inspector
Normalized metadata (stored in metadata_json)
{
"metadata": {},
"card_data": {
"license": "apache-2.0",
"library_name": "transformers",
"pipeline_tag": "image-to-text",
"frontmatter": {
"license": "apache-2.0",
"library_name": "transformers",
"pipeline_tag": "image-to-text"
},
"hero_image_url": "https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true",
"summary": "Quantization made by Richard Erkhov. This repository contains a quantized version of the model presented in Visually Descriptive Language Model for Vector Graphics Reasoning. Github Discord Request more models Project page: https://mikewangwzhl.github.io/VDLM/ Code: https://github.com/MikeWangWZHL/VDLM PVD-160k-Mistral-7b - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | PVD-160k-Mistral-7b.Q2_K.gguf | Q2_K | 2.53GB | | PVD-160k-Mistral-7b.IQ3_XS.gguf | IQ3_XS | 2.81GB | | PVD-160k-Mistral-7b.IQ3_S.gguf | IQ3_S | 2.96GB | | PVD-160k-Mistral-7b.Q3_K_S.gguf | Q3_K_S | 2.95GB | | PVD-160k-Mistral-7b.IQ3_M.gguf | IQ3_M | 3.06GB | | PVD-160k-Mistral-7b.Q3_K.gguf | Q3_K | 3.28GB | | PVD-160k-Mistral-7b.Q3_K_M.gguf | Q3_K_M | 3.28GB | | PVD-160k-Mistral-7b.Q3_K_L.gguf | Q3_K_L | 3.56GB | | PVD-160k-Mistral-7b.IQ4_XS.gguf | IQ4_XS | 3.67GB | | PVD-160k-Mistral-7b.Q4_0.gguf | Q4_0 | 3.83GB | | PVD-160k-Mistral-7b.IQ4_NL.gguf | IQ4_NL | 3.87GB | | PVD-160k-Mistral-7b.Q4_K_S.gguf | Q4_K_S | 3.86GB | | PVD-160k-Mistral-7b.Q4_K.gguf | Q4_K | 4.07GB | | PVD-160k-Mistral-7b.Q4_K_M.gguf | Q4_K_M | 4.07GB | | PVD-160k-Mistral-7b.Q4_1.gguf | Q4_1 | 4.24GB | | PVD-160k-Mistral-7b.Q5_0.gguf | Q5_0 | 4.65GB | | PVD-160k-Mistral-7b.Q5_K_S.gguf | Q5_K_S | 4.65GB | | PVD-160k-Mistral-7b.Q5_K.gguf | Q5_K | 4.78GB | | PVD-160k-Mistral-7b.Q5_K_M.gguf | Q5_K_M | 1.7GB | | PVD-160k-Mistral-7b.Q5_1.gguf | Q5_1 | 5.07GB | | PVD-160k-Mistral-7b.Q6_K.gguf | Q6_K | 5.53GB | | PVD-160k-Mistral-7b.Q8_0.gguf | Q8_0 | 7.17GB | Original model description: --- license: apache-2.0 datasets: --- Text-Based Reasoning About Vector Graphics 🌐 Homepage • 📃 Paper • 🤗 Data (PVD-160k) • 🤗 Model (PVD-160k-Mistral-7b) • 💻 Code We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes. !Teaser To solve this challenge, we propose **Visually Descriptive Language Model (VDLM)**, a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our paper for more details. !Overview",
"quick_links": [],
"benchmark_table_html": "",
"readme_markdown": "---\nlicense: apache-2.0\nlibrary_name: transformers\npipeline_tag: image-to-text\n---\n\nQuantization made by Richard Erkhov.\n\nThis repository contains a quantized version of the model presented in [Visually Descriptive Language Model for Vector Graphics Reasoning](https://huggingface.co/papers/2404.06479).\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\nProject page: https://mikewangwzhl.github.io/VDLM/\nCode: https://github.com/MikeWangWZHL/VDLM\n\nPVD-160k-Mistral-7b - GGUF\n- Model creator: https://huggingface.co/mikewang/\n- Original model: https://huggingface.co/mikewang/PVD-160k-Mistral-7b/\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [PVD-160k-Mistral-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q2_K.gguf) | Q2_K | 2.53GB |\n| [PVD-160k-Mistral-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_XS.gguf) | IQ3_XS | 2.81GB |\n| [PVD-160k-Mistral-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_S.gguf) | IQ3_S | 2.96GB |\n| [PVD-160k-Mistral-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_S.gguf) | Q3_K_S | 2.95GB |\n| [PVD-160k-Mistral-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_M.gguf) | IQ3_M | 3.06GB |\n| [PVD-160k-Mistral-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K.gguf) | Q3_K | 3.28GB |\n| [PVD-160k-Mistral-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_M.gguf) | Q3_K_M | 3.28GB |\n| [PVD-160k-Mistral-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_L.gguf) | Q3_K_L | 3.56GB |\n| [PVD-160k-Mistral-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ4_XS.gguf) | IQ4_XS | 3.67GB |\n| [PVD-160k-Mistral-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_0.gguf) | Q4_0 | 3.83GB |\n| [PVD-160k-Mistral-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ4_NL.gguf) | IQ4_NL | 3.87GB |\n| [PVD-160k-Mistral-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K_S.gguf) | Q4_K_S | 3.86GB |\n| [PVD-160k-Mistral-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K.gguf) | Q4_K | 4.07GB |\n| [PVD-160k-Mistral-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K_M.gguf) | Q4_K_M | 4.07GB |\n| [PVD-160k-Mistral-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_1.gguf) | Q4_1 | 4.24GB |\n| [PVD-160k-Mistral-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_0.gguf) | Q5_0 | 4.65GB |\n| [PVD-160k-Mistral-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K_S.gguf) | Q5_K_S | 4.65GB |\n| [PVD-160k-Mistral-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K.gguf) | Q5_K | 4.78GB |\n| [PVD-160k-Mistral-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K_M.gguf) | Q5_K_M | 1.7GB |\n| [PVD-160k-Mistral-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_1.gguf) | Q5_1 | 5.07GB |\n| [PVD-160k-Mistral-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q6_K.gguf) | Q6_K | 5.53GB |\n| [PVD-160k-Mistral-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q8_0.gguf) | Q8_0 | 7.17GB |\n\nOriginal model description:\n---\nlicense: apache-2.0\ndatasets:\n- mikewang/PVD-160K\n---\n\n<h1 align=\"center\"> Text-Based Reasoning About Vector Graphics </h1>\n\n<p align=\"center\">\n<a href=\"https://mikewangwzhl.github.io/VDLM\">🌐 Homepage</a>\n•\n<a href=\"https://arxiv.org/abs/2404.06479\">📃 Paper</a>\n•\n<a href=\"https://huggingface.co/datasets/mikewang/PVD-160K\" >🤗 Data (PVD-160k)</a>\n•\n<a href=\"https://huggingface.co/mikewang/PVD-160k-Mistral-7b\" >🤗 Model (PVD-160k-Mistral-7b)</a>\n•\n<a href=\"https://github.com/MikeWangWZHL/VDLM\" >💻 Code</a>\n\n</p>\n\nWe observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.\n\n\n\nTo solve this challenge, we propose **Visually Descriptive Language Model (VDLM)**, a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our [paper](https://arxiv.org/abs/2404.06479) for more details.\n",
"related_quantizations": []
},
"tags": [
"transformers",
"gguf",
"image-to-text",
"arxiv:2404.06479",
"license:apache-2.0",
"endpoints_compatible",
"region:us",
"conversational"
],
"likes": 0,
"downloads": 209,
"gated": false,
"private": false,
"last_modified": "2025-06-16T18:43:22.000Z",
"created_at": "2024-10-10T15:52:37.000Z",
"pipeline_tag": "image-to-text",
"library_name": "transformers"
}
Source payload excerpt (from Hugging Face API)
{
"_id": "6707f84550e71469e19d1802",
"id": "RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf",
"modelId": "RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf",
"sha": "ae1e18d957f47a565809bb3461089c1463b177ef",
"createdAt": "2024-10-10T15:52:37.000Z",
"lastModified": "2025-06-16T18:43:22.000Z",
"author": "RichardErkhov",
"downloads": 209,
"likes": 0,
"gated": false,
"private": false,
"pipeline_tag": "image-to-text",
"library_name": "transformers",
"siblings_count": 24
}