ledgergap/Pollux-4B-Judge-GGUF overview
Pollux 4B Judge GGUF This repository contains GGUF versions of ai forever/Pollux 4B Judge https://huggingface.co/ai forever/Pollux 4B Judge for local inference…
Runs locally from ~3.99 GB disk (4 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | ledgergap/Pollux-4B-Judge-GGUF |
|---|---|
| Author | ledgergap |
| Pipeline | text-generation |
| License | mit |
| Base model | ai-forever/Pollux-4B-Judge |
| Last modified | 2026-06-08T21:01:43.000Z |
Model README
---
license: mit
language:
- ru
base_model: ai-forever/Pollux-4B-Judge
base_model_relation: quantized
library_name: gguf
pipeline_tag: text-generation
tags:
- gguf
- llama.cpp
- lmstudio
- qwen3
- pollux
- russian
- llm-as-a-judge
- quantized
quantized_by: ledgergap
---
Pollux-4B-Judge GGUF
This repository contains GGUF versions of ai-forever/Pollux-4B-Judge for local inference with llama.cpp, LM Studio, and other GGUF-compatible runtimes.
Pollux-4B-Judge is a Russian-oriented LLM-as-a-judge model based on Qwen3-4B. It is intended for evaluating model answers against a specific criterion and scoring rubric.
Files
| File | Type | Quantized | Notes |
|---|---:|---:|---|
| Pollux-4B-Judge.BF16.gguf | BF16 GGUF conversion | No | High-precision reference version |
| Pollux-4B-Judge.Q8_0.gguf | Q8_0 GGUF quantization | Yes | High-quality quantized version |
Which file should I use?
Use Pollux-4B-Judge.BF16.gguf if you want the highest-quality reference version.
Use Pollux-4B-Judge.Q8_0.gguf if you want a practical local version with lower memory usage and minimal expected quality loss.
Recommended inference settings
For judge-style usage, the original model card uses:
| Setting | Value |
|---|---:|
| Temperature | 0.0 |
| Max tokens | 512 |
For local GGUF inference, choose a context length large enough to fit the full evaluation prompt: instruction, reference answer, evaluated answer, criterion, and rubric. A practical starting point is 8192, but this is a local runtime recommendation rather than an official value from the original model card.
The model is intended to evaluate one criterion per request.
Prompt format
Recommended prompt structure:
### Задание для оценки:
{instruction}
### Эталонный ответ:
{reference_answer}
### Ответ для оценки:
{answer}
### Критерий оценки:
{criterion}
### Шкала оценивания по критерию:
{rubric}
Use with llama.cpp
BF16:
llama-server -hf ledgergap/Pollux-4B-Judge-GGUF:BF16 -c 8192 -ngl 99
Q8_0:
llama-server -hf ledgergap/Pollux-4B-Judge-GGUF:Q8_0 -c 8192 -ngl 99
Use with LM Studio
Open LM Studio and paste this repository URL into the model search/download field:
https://huggingface.co/ledgergap/Pollux-4B-Judge-GGUF
Then select either the BF16 or Q8_0 GGUF file.
Original model
Original model: ai-forever/Pollux-4B-Judge
Run ledgergap/Pollux-4B-Judge-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models