GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

forkjoin-ai/qwen2.5-72b-instruct-gguf overview

Qwen2.5 72b Instruct GGUF, Q4 K M Production ready GGUF quantization of Qwen/Qwen2.5 72B Instruct https://huggingface.co/Qwen/Qwen2.5 72B Instruct for distribu…

ggufqwen2instructaffectivelyedgeworkaetherdistributed-inferenceedge-deploymenttext-generationenbase_model:Qwen/Qwen2.5-72B-Instructbase_model:quantized:Qwen/Qwen2.5-72B-Instructlicense:apache-2.0endpoints_compatibleregion:usimatrixconversational

Runs locally from ~44.16 GB disk (32 GB+ VRAM class GPUs with llama.cpp / guIDE).

Downloads
0
Likes
0
Pipeline
text-generation

Repository Files & Downloads

1 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
Qwen2.5-72B-Instruct-Q4_K_M.ggufGGUFQ4_K_M44.16 GBDownload

Model Details

Model IDforkjoin-ai/qwen2.5-72b-instruct-gguf
Authorforkjoin-ai
Pipelinetext-generation
Licenseapache-2.0
Base modelQwen/Qwen2.5-72B-Instruct
Last modified2026-06-08T21:25:26.000Z

Model README

---

language:

- en

license: apache-2.0

library_name: gguf

tags:

- gguf

- qwen2

- instruct

- affectively

- edgework

- aether

- distributed-inference

- edge-deployment

base_model: Qwen/Qwen2.5-72B-Instruct

base_model_relation: quantized

pipeline_tag: text-generation

---

Qwen2.5 72b Instruct (GGUF, Q4_K_M)

> Production-ready GGUF quantization of Qwen/Qwen2.5-72B-Instruct for distributed text generation and conversation — powered by the Aether edge inference runtime on Edgework.ai.

Model Details

| Property | Value |

|----------|-------|

| Base model | Qwen/Qwen2.5-72B-Instruct |

| Parameters | 72B |

| Architecture | Qwen2 |

| Quantization | Q4_K_M |

| Format | GGUF |

| Size | ~43 GB |

| License | apache-2.0 |

Usage

With llama.cpp

./llama-cli -m Qwen2.5-72B-Instruct-Q4_K_M.gguf -p "Your prompt here" -n 256

With Aether (Distributed Inference)

This model is deployed across the Aether distributed inference network. Weights are layer-sharded and distributed across multiple edge nodes for parallel inference.

Also available: .knot (sovereign format)

This repo ships qwen2.5-72b-instruct.knot — the model weights in the KNOT container that the Aether distributed-inference runtime loads natively (the GGUF, when present, sits right beside it). A KNOT is a single self-describing file with a JSON table-of-contents, so any single tensor is one HTTP Range request — ideal for streaming weights to edge nodes.

| | GGUF | KNOT |

|---|---|---|

| Container | format-specific header | single file, JSON table-of-contents |

| Per-tensor fetch | whole-file oriented | one tensor = one Range request |

| Ecosystem | broad (llama.cpp, …) | Aether / Gnosis runtime |

huggingface-cli download forkjoin-ai/qwen2.5-72b-instruct-gguf qwen2.5-72b-instruct.knot --local-dir ./knots

Full format spec: KNOT_FORMAT.md. Inspect the header with bun run open-source/bitwise/scripts/dump-knot.ts qwen2.5-72b-instruct.knot.

Deployment Architecture

This model runs on the Aether distributed inference runtime — a custom engine that shards model layers across multiple nodes for parallel execution:

  1. Coordinator receives requests and manages token generation
  2. Layer nodes each hold a subset of model layers (6 nodes for this model)
  3. Hidden states flow between nodes via gRPC
  4. Zero cold start via warm pool scheduling

Deployed via Edgework.ai — bringing fast, cheap, and private inference as close to the user as possible.

About

Published by AFFECTIVELY · Managed by @buley

We quantize and publish production-ready models for distributed edge inference via the Aether runtime. Every release is tested for correctness and stability before publication.

Run forkjoin-ai/qwen2.5-72b-instruct-gguf with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models