What is morikomorizz/Step-3.7-Flash-MTP-GGUF?

--- license: apache-2.0 base_model: - stepfun-ai/Step-3.7-Flash pipeline_tag: image-text-to-text language: - en tags: - vision-language - multimodal - moe --- ## Overview This repository contains the **GGUF** quantized files for **[stepfun-ai/Step-3.7-Flash](https://huggingface.co/stepfun-ai/Step-3.7-Flash)**. - **Original Model:** [stepfun-ai/Step-3.7-Flash](https://huggingface.co/stepfun-ai/Step-3.7-Flash) - **Architecture:** Step-3.7-Flash - **License:** Apache 2.0 - **MTP Support:** Yes - From Base Model | Quant Type | Size | Description | | :--- | :--- | :--- | | **IQ3_S** | 85-91 GB | Mixed Precision for Better Quality | | **IQ3_M** | 96-103 GB | Mixed Precision for Better Quality | | **IQ4_XS** | 109-117 GB | Mixed Precision for Better Quality | ---- **[ModelPage]**: https://static.stepfun.com/blog/step-3.7-flash/ ## 1. Introduction Step 3.7 Flash is a 198B-parameter sparse Mixtu…

What license applies to morikomorizz/Step-3.7-Flash-MTP-GGUF?

License: apache-2.0. Verify terms on Hugging Face before commercial use.

How do I run morikomorizz/Step-3.7-Flash-MTP-GGUF locally?

Download a GGUF file from this page and load it in guIDE or llama.cpp. Pipeline task: image-text-to-text.

Model Intelligence Sheet

morikomorizz/Step-3.7-Flash-MTP-GGUF overview

Overview This repository contains the GGUF quantized files for stepfun ai/Step 3.7 Flash https://huggingface.co/stepfun ai/Step 3.7 Flash . Original Model: ste…

ggufvision-languagemultimodalmoeimage-text-to-textenbase_model:stepfun-ai/Step-3.7-Flashbase_model:quantized:stepfun-ai/Step-3.7-Flashlicense:apache-2.0endpoints_compatibleregion:usimatrixconversational

Runs locally from ~2.56 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads

Likes

Pipeline

image-text-to-text

Author

morikomorizz

Repository Files & Downloads

5 GGUF files detected

Direct downloads for local inference

File	Type	Quantization	Size	Link
Step-3.7-Flash-IQ3_M.gguf	GGUF	IQ3_M	96.38 GB	Download
Step-3.7-Flash-IQ3_XS.gguf	GGUF	IQ3_XS	85.39 GB	Download
Step-3.7-Flash-IQ4_XS.gguf	GGUF	IQ4_XS	108.88 GB	Download
Step-3.7-Flash-mmproj-BF16.gguf	GGUF	BF16	4.10 GB	Download
Step-3.7-Flash-mmproj-Q8_0.gguf	GGUF	Q8_0	2.56 GB	Download

Model Details

Model ID	morikomorizz/Step-3.7-Flash-MTP-GGUF
Author	morikomorizz
Pipeline	image-text-to-text
License	apache-2.0
Base model	stepfun-ai/Step-3.7-Flash
Last modified	2026-06-23T06:56:04.000Z

Model README

---

license: apache-2.0

base_model:

stepfun-ai/Step-3.7-Flash

pipeline_tag: image-text-to-text

language:

- en

tags:

- vision-language

- multimodal

- moe

---

Overview

This repository contains the GGUF quantized files for stepfun-ai/Step-3.7-Flash.

Original Model: stepfun-ai/Step-3.7-Flash
Architecture: Step-3.7-Flash
License: Apache 2.0
MTP Support: Yes - From Base Model

| Quant Type | Size | Description |

| :--- | :--- | :--- |

| IQ3_S | 85-91 GB | Mixed Precision for Better Quality |

| IQ3_M | 96-103 GB | Mixed Precision for Better Quality |

| IQ4_XS | 109-117 GB | Mixed Precision for Better Quality |

----

[ModelPage]: https://static.stepfun.com/blog/step-3.7-flash/

1. Introduction

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model that combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding. Engineered for high-frequency production workloads, it activates approximately 11B parameters per token and delivers a throughput of up to 400 tokens per second. Step 3.7 Flash supports a 256k context window and offers three selectable reasoning levels (low, medium, and high) so developers can easily balance speed, cost, and cognitive depth.

We built Step 3.7 Flash for developers who need to scale agentic workflows that combine perception, search, and reasoning. It is designed to handle intensive tasks such as parsing massive financial reports in one pass, running multi-step search loops with cross-source verification, or operating concurrent coding agents in high-throughput pipelines.

2. Capabilities & Performance

Multimodal Perception and Verification

The model delivers top-tier visual intelligence, securing first place on SimpleVQA (Search) with a 79.2 and achieving frontier parity on V* (Python) at 95.3. These metrics reflect strong visual grounding and retrieval-augmented reasoning beyond basic image description. The model accurately processes dense visual interfaces, such as UI wireframes, application GUIs, and data charts, to map them into structured code. When it encounters an incomplete visual asset, it can independently identify missing data and execute lookups to verify context before returning a factually verified conclusion.

Workflow Integrity and Tool Orchestration

Execution reliability is critical for autonomous agents. Step 3.7 Flash leads the ClawEval-1.1 benchmark with a score of 67.1, which significantly outperforms the next closest competitor at 59.8. This performance demonstrates high resistance to adversarial traps and strict adherence to system policies during multi-turn orchestration. Backed by scores of 49.5 on Toolathlon and 48.1 on HLE w. Tool, this profile ensures high trajectory integrity. Step 3.7 Flash reliably interacts with external APIs and executes long-horizon workflows without drifting from instructions or violating system constraints.

Code Engineering and Professional Baselines

Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GDPVal-AA (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.

!Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations

---

How to Use

These GGUF files are fully compatible with llama.cpp and popular graphical interfaces like LM Studio.

using `llama.cpp` CLI:

./llama-cli -m /path/to/model/Step-3.7-Flash-IQ4_XS.gguf \
  -p "Hello, how are you?" \
  -sys "You are a helpful AI" \
  -n 4096 \
  -c 8192

using `llama-server` :

./llama-cli -m /path/to/model/Step-3.7-Flash-IQ4_XS.gguf \
  --host 0.0.0.0 \
  --port 8080

Run morikomorizz/Step-3.7-Flash-MTP-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models