Model Intelligence Sheet

richarderkhov/mikewang_-_pvd-160k-mistral-7b-gguf overview

Quantization made by Richard Erkhov. This repository contains a quantized version of the model presented in Visually Descriptive Language Model for Vector Graphics Reasoning. Github Discord Request more models Project page: https://mikewangwzhl.github.io/VDLM/ Code: https://github.com/MikeWangWZHL/VDLM PVD-160k-Mistral-7b - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | PVD-160k-Mistral-7b.Q2K.gguf | Q2K | 2.53GB | | PVD-160k-Mistral-7b.IQ3XS.gguf | IQ3XS | 2.81GB | | PVD-160k-Mistral-7b.IQ3S.gguf | IQ3S | 2.96GB | | PVD-160k-Mistral-7b.Q3KS.gguf | Q3KS | 2.95GB | | PVD-160k-Mistral-7b.IQ3M.gguf | IQ3M | 3.06GB | | PVD-160k-Mistral-7b.Q3K.gguf | Q3K | 3.28GB | | PVD-160k-Mistral-7b.Q3KM.gguf | Q3KM | 3.28GB | | PVD-160k-Mistral-7b.Q3KL.gguf | Q3KL | 3.56GB | | PVD-160k-Mistral-7b.IQ4XS.gguf | IQ4XS | 3.67GB | | PVD-160k-Mistral-7b.Q40.gguf | Q40 | 3.83GB | | PVD-160k-Mistral-7b.IQ4NL.gguf | IQ4NL | 3.87GB | | PVD-160k-Mistral-7b.Q4KS.gguf | Q4KS | 3.86GB | | PVD-160k-Mistral-7b.Q4K.gguf | Q4K | 4.07GB | | PVD-160k-Mistral-7b.Q4KM.gguf | Q4KM | 4.07GB | | PVD-160k-Mistral-7b.Q41.gguf | Q41 | 4.24GB | | PVD-160k-Mistral-7b.Q50.gguf | Q50 | 4.65GB | | PVD-160k-Mistral-7b.Q5KS.gguf | Q5KS | 4.65GB | | PVD-160k-Mistral-7b.Q5K.gguf | Q5K | 4.78GB | | PVD-160k-Mistral-7b.Q5KM.gguf | Q5KM | 1.7GB | | PVD-160k-Mistral-7b.Q51.gguf | Q51 | 5.07GB | | PVD-160k-Mistral-7b.Q6K.gguf | Q6K | 5.53GB | | PVD-160k-Mistral-7b.Q80.gguf | Q80 | 7.17GB | Original model description: --- license: apache-2.0 datasets: --- Text-Based Reasoning About Vector Graphics 🌐 Homepage • 📃 Paper • 🤗 Data (PVD-160k) • 🤗 Model (PVD-160k-Mistral-7b) • 💻 Code We observe that current large multimodal models (LMMs) still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes. !Teaser To solve this challenge, we propose Visually Descriptive Language Model (VDLM), a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our paper for more details. !Overview

transformersggufimage-to-textarxiv:2404.06479license:apache-2.0endpoints_compatibleregion:usconversational

richarderkhov/mikewang_-_pvd-160k-mistral-7b-gguf visual

Downloads

209

Likes

Pipeline

image-to-text

Library

transformers

Visibility

Public

Access

Open

Repository Files & Downloads

22 files detected

Direct downloads for all repository files

File	Type	Quantization	Size	Link
PVD-160k-Mistral-7b.IQ3_M.gguf	GGUF	IQ3_M	3.06 GB	Download
PVD-160k-Mistral-7b.IQ3_S.gguf	GGUF	IQ3_S	2.96 GB	Download
PVD-160k-Mistral-7b.IQ3_XS.gguf	GGUF	IQ3_XS	2.81 GB	Download
PVD-160k-Mistral-7b.IQ4_NL.gguf	GGUF	IQ4_NL	3.87 GB	Download
PVD-160k-Mistral-7b.IQ4_XS.gguf	GGUF	IQ4_XS	3.67 GB	Download
PVD-160k-Mistral-7b.Q2_K.gguf	GGUF	Q2_K	2.53 GB	Download
PVD-160k-Mistral-7b.Q3_K.gguf	GGUF	Q3_K	3.28 GB	Download
PVD-160k-Mistral-7b.Q3_K_L.gguf	GGUF	Q3_K_L	3.56 GB	Download
PVD-160k-Mistral-7b.Q3_K_M.gguf	GGUF	Q3_K_M	3.28 GB	Download
PVD-160k-Mistral-7b.Q3_K_S.gguf	GGUF	Q3_K_S	2.95 GB	Download
PVD-160k-Mistral-7b.Q4_0.gguf	GGUF	—	3.83 GB	Download
PVD-160k-Mistral-7b.Q4_1.gguf	GGUF	—	4.24 GB	Download
PVD-160k-Mistral-7b.Q4_K.gguf	GGUF	Q4_K	4.07 GB	Download
PVD-160k-Mistral-7b.Q4_K_M.gguf	GGUF	Q4_K_M	4.07 GB	Download
PVD-160k-Mistral-7b.Q4_K_S.gguf	GGUF	Q4_K_S	3.86 GB	Download
PVD-160k-Mistral-7b.Q5_0.gguf	GGUF	—	4.65 GB	Download
PVD-160k-Mistral-7b.Q5_1.gguf	GGUF	—	5.07 GB	Download
PVD-160k-Mistral-7b.Q5_K.gguf	GGUF	Q5_K	4.78 GB	Download
PVD-160k-Mistral-7b.Q5_K_M.gguf	GGUF	Q5_K_M	1.70 GB	Download
PVD-160k-Mistral-7b.Q5_K_S.gguf	GGUF	Q5_K_S	4.65 GB	Download
PVD-160k-Mistral-7b.Q6_K.gguf	GGUF	Q6_K	5.53 GB	Download
PVD-160k-Mistral-7b.Q8_0.gguf	GGUF	—	7.17 GB	Download

Model Details Live

Model Slug

richarderkhov/mikewang_-_pvd-160k-mistral-7b-gguf

Author

RichardErkhov

Pipeline Task

image-to-text

Library

transformers

Created

2024-10-10

Last Modified

2025-06-16

Gated

Private

HF SHA

ae1e18d957f47a565809bb3461089c1463b177ef

License

apache-2.0

Language

Unknown

Base Model

Unknown

Metadata Inspector

Normalized metadata (stored in metadata_json)

{
  "metadata": {},
  "card_data": {
    "license": "apache-2.0",
    "library_name": "transformers",
    "pipeline_tag": "image-to-text",
    "frontmatter": {
      "license": "apache-2.0",
      "library_name": "transformers",
      "pipeline_tag": "image-to-text"
    },
    "hero_image_url": "https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true",
    "summary": "Quantization made by Richard Erkhov. This repository contains a quantized version of the model presented in Visually Descriptive Language Model for Vector Graphics Reasoning. Github Discord Request more models Project page: https://mikewangwzhl.github.io/VDLM/ Code: https://github.com/MikeWangWZHL/VDLM PVD-160k-Mistral-7b - GGUF | Name | Quant method | Size | | ---- | ---- | ---- | | PVD-160k-Mistral-7b.Q2_K.gguf | Q2_K | 2.53GB | | PVD-160k-Mistral-7b.IQ3_XS.gguf | IQ3_XS | 2.81GB | | PVD-160k-Mistral-7b.IQ3_S.gguf | IQ3_S | 2.96GB | | PVD-160k-Mistral-7b.Q3_K_S.gguf | Q3_K_S | 2.95GB | | PVD-160k-Mistral-7b.IQ3_M.gguf | IQ3_M | 3.06GB | | PVD-160k-Mistral-7b.Q3_K.gguf | Q3_K | 3.28GB | | PVD-160k-Mistral-7b.Q3_K_M.gguf | Q3_K_M | 3.28GB | | PVD-160k-Mistral-7b.Q3_K_L.gguf | Q3_K_L | 3.56GB | | PVD-160k-Mistral-7b.IQ4_XS.gguf | IQ4_XS | 3.67GB | | PVD-160k-Mistral-7b.Q4_0.gguf | Q4_0 | 3.83GB | | PVD-160k-Mistral-7b.IQ4_NL.gguf | IQ4_NL | 3.87GB | | PVD-160k-Mistral-7b.Q4_K_S.gguf | Q4_K_S | 3.86GB | | PVD-160k-Mistral-7b.Q4_K.gguf | Q4_K | 4.07GB | | PVD-160k-Mistral-7b.Q4_K_M.gguf | Q4_K_M | 4.07GB | | PVD-160k-Mistral-7b.Q4_1.gguf | Q4_1 | 4.24GB | | PVD-160k-Mistral-7b.Q5_0.gguf | Q5_0 | 4.65GB | | PVD-160k-Mistral-7b.Q5_K_S.gguf | Q5_K_S | 4.65GB | | PVD-160k-Mistral-7b.Q5_K.gguf | Q5_K | 4.78GB | | PVD-160k-Mistral-7b.Q5_K_M.gguf | Q5_K_M | 1.7GB | | PVD-160k-Mistral-7b.Q5_1.gguf | Q5_1 | 5.07GB | | PVD-160k-Mistral-7b.Q6_K.gguf | Q6_K | 5.53GB | | PVD-160k-Mistral-7b.Q8_0.gguf | Q8_0 | 7.17GB | Original model description: --- license: apache-2.0 datasets: ---  Text-Based Reasoning About Vector Graphics   🌐 Homepage • 📃 Paper • 🤗 Data (PVD-160k) • 🤗 Model (PVD-160k-Mistral-7b) • 💻 Code  We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes. !Teaser To solve this challenge, we propose **Visually Descriptive Language Model (VDLM)**, a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our paper for more details. !Overview",
    "quick_links": [],
    "benchmark_table_html": "",
    "readme_markdown": "---\nlicense: apache-2.0\nlibrary_name: transformers\npipeline_tag: image-to-text\n---\n\nQuantization made by Richard Erkhov.\n\nThis repository contains a quantized version of the model presented in [Visually Descriptive Language Model for Vector Graphics Reasoning](https://huggingface.co/papers/2404.06479).\n\n[Github](https://github.com/RichardErkhov)\n\n[Discord](https://discord.gg/pvy7H8DZMG)\n\n[Request more models](https://github.com/RichardErkhov/quant_request)\n\nProject page: https://mikewangwzhl.github.io/VDLM/\nCode: https://github.com/MikeWangWZHL/VDLM\n\nPVD-160k-Mistral-7b - GGUF\n- Model creator: https://huggingface.co/mikewang/\n- Original model: https://huggingface.co/mikewang/PVD-160k-Mistral-7b/\n\n| Name | Quant method | Size |\n| ---- | ---- | ---- |\n| [PVD-160k-Mistral-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q2_K.gguf) | Q2_K | 2.53GB |\n| [PVD-160k-Mistral-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_XS.gguf) | IQ3_XS | 2.81GB |\n| [PVD-160k-Mistral-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_S.gguf) | IQ3_S | 2.96GB |\n| [PVD-160k-Mistral-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_S.gguf) | Q3_K_S | 2.95GB |\n| [PVD-160k-Mistral-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ3_M.gguf) | IQ3_M | 3.06GB |\n| [PVD-160k-Mistral-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K.gguf) | Q3_K | 3.28GB |\n| [PVD-160k-Mistral-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_M.gguf) | Q3_K_M | 3.28GB |\n| [PVD-160k-Mistral-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q3_K_L.gguf) | Q3_K_L | 3.56GB |\n| [PVD-160k-Mistral-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ4_XS.gguf) | IQ4_XS | 3.67GB |\n| [PVD-160k-Mistral-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_0.gguf) | Q4_0 | 3.83GB |\n| [PVD-160k-Mistral-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.IQ4_NL.gguf) | IQ4_NL | 3.87GB |\n| [PVD-160k-Mistral-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K_S.gguf) | Q4_K_S | 3.86GB |\n| [PVD-160k-Mistral-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K.gguf) | Q4_K | 4.07GB |\n| [PVD-160k-Mistral-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_K_M.gguf) | Q4_K_M | 4.07GB |\n| [PVD-160k-Mistral-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q4_1.gguf) | Q4_1 | 4.24GB |\n| [PVD-160k-Mistral-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_0.gguf) | Q5_0 | 4.65GB |\n| [PVD-160k-Mistral-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K_S.gguf) | Q5_K_S | 4.65GB |\n| [PVD-160k-Mistral-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K.gguf) | Q5_K | 4.78GB |\n| [PVD-160k-Mistral-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_K_M.gguf) | Q5_K_M | 1.7GB |\n| [PVD-160k-Mistral-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q5_1.gguf) | Q5_1 | 5.07GB |\n| [PVD-160k-Mistral-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q6_K.gguf) | Q6_K | 5.53GB |\n| [PVD-160k-Mistral-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf/blob/main/PVD-160k-Mistral-7b.Q8_0.gguf) | Q8_0 | 7.17GB |\n\nOriginal model description:\n---\nlicense: apache-2.0\ndatasets:\n- mikewang/PVD-160K\n---\n\n<h1 align=\"center\"> Text-Based Reasoning About Vector Graphics </h1>\n\n<p align=\"center\">\n<a href=\"https://mikewangwzhl.github.io/VDLM\">🌐 Homepage</a>\n•\n<a href=\"https://arxiv.org/abs/2404.06479\">📃 Paper</a>\n•\n<a href=\"https://huggingface.co/datasets/mikewang/PVD-160K\" >🤗 Data (PVD-160k)</a>\n•\n<a href=\"https://huggingface.co/mikewang/PVD-160k-Mistral-7b\" >🤗 Model (PVD-160k-Mistral-7b)</a>\n•\n<a href=\"https://github.com/MikeWangWZHL/VDLM\" >💻 Code</a>\n\n</p>\n\nWe observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.\n\n![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)\n\nTo solve this challenge, we propose **Visually Descriptive Language Model (VDLM)**, a visual reasoning framework that operates with intermediate text-based visual descriptions—SVG representations and learned Primal Visual Description, which can be directly integrated into existing LLMs and LMMs. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our [paper](https://arxiv.org/abs/2404.06479) for more details.\n![Overview](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/overview.png?raw=true)",
    "related_quantizations": []
  },
  "tags": [
    "transformers",
    "gguf",
    "image-to-text",
    "arxiv:2404.06479",
    "license:apache-2.0",
    "endpoints_compatible",
    "region:us",
    "conversational"
  ],
  "likes": 0,
  "downloads": 209,
  "gated": false,
  "private": false,
  "last_modified": "2025-06-16T18:43:22.000Z",
  "created_at": "2024-10-10T15:52:37.000Z",
  "pipeline_tag": "image-to-text",
  "library_name": "transformers"
}

Source payload excerpt (from Hugging Face API)

{
  "_id": "6707f84550e71469e19d1802",
  "id": "RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf",
  "modelId": "RichardErkhov/mikewang_-_PVD-160k-Mistral-7b-gguf",
  "sha": "ae1e18d957f47a565809bb3461089c1463b177ef",
  "createdAt": "2024-10-10T15:52:37.000Z",
  "lastModified": "2025-06-16T18:43:22.000Z",
  "author": "RichardErkhov",
  "downloads": 209,
  "likes": 0,
  "gated": false,
  "private": false,
  "pipeline_tag": "image-to-text",
  "library_name": "transformers",
  "siblings_count": 24
}