GraySoft
Projects Models Compare Cloud benchmarks FAQ Download guIDE →
Model Intelligence Sheet

deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF overview

RavenX CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM 35B MoE 3B Active | Q4 K M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic…

ggufsecuritycybersecuritypentestbug-bountyred-teamagenttool-callingMCPGGUFllama-cppollamalm-studiovllmCVSSCWEMITRE-ATT&CKravenxrath-protocolMoE35Bautonomous-agentabliteratedqwen3.6

Runs locally from ~20.22 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).

Downloads
1,036
Likes
4
Pipeline
text-generation

Repository Files & Downloads

2 GGUF files detected
Direct downloads for local inference
FileTypeQuantizationSizeLink
RavenX-CyberAgent-35B-v5.1-F16.ggufGGUFF1666.19 GBDownload
RavenX-CyberAgent-35B-v5.1-Q4_K_M.ggufGGUFQ4_K_M20.22 GBDownload

Model Details

Model IDdeadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF
Authordeadbydawn101
Pipelinetext-generation
Licenseapache-2.0
Base modelhuihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated
Last modified2026-06-08T06:35:58.000Z

Model README

---

license: apache-2.0

base_model: huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated

tags:

- security

- cybersecurity

- pentest

- bug-bounty

- red-team

- agent

- tool-calling

- MCP

- GGUF

- llama-cpp

- ollama

- lm-studio

- vllm

- CVSS

- CWE

- MITRE-ATT&CK

- ravenx

- rath-protocol

- MoE

- 35B

- autonomous-agent

- abliterated

- qwen3.6

- openmythos

- quantized

language:

- en

pipeline_tag: text-generation

library_name: gguf

---

RavenX-CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM

35B MoE (3B Active) | Q4_K_M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic

> The most comprehensive open-source security agent model — in GGUF. Runs in Ollama, LM Studio, llama.cpp, vLLM, and any GGUF runtime. 51/51 LoRA tensors merged. Identical to the MLX version.

Built by @DeadByDawn101 | RavenX LLC

> "We don't give up. We do what others don't and build what isn't possible." — RavenX LLC

---

Also Available (Same Model, Different Format)

| Format | Link | Best For |

|--------|------|----------|

| GGUF (THIS) | You are here | Ollama, LM Studio, llama.cpp, vLLM, NVIDIA GPUs |

| MLX | RavenX-CyberAgent MLX | Apple Silicon native (M1-M4) |

Both versions are identical — same 51/51 LoRA tensors, same 745K+ training data, same 12 training rounds.

---

Benchmarks (M4 Max 128GB, llama.cpp b9501)

Prompt Processing:  900.6 tokens/sec
Generation:          89.3 tokens/sec
Model Size:          20.7 GB (Q4_K_M, 4.89 BPW)
Peak Memory:         ~24 GB
Context Tested:      32K (262K native)

People are NOT getting the most out of local LLMs. A 35B MoE at Q4_K_M gives dramatically better output than a 7B model at the SAME speed — because only 3B params activate per token.

| Model | Speed | Quality | Size |

|-------|-------|---------|------|

| Llama 7B Q4 | ~30 t/s | Basic chat | 4 GB |

| Mistral 7B Q4 | ~50 t/s | Decent | 4 GB |

| RavenX 35B MoE Q4 | 89 t/s | Kill chains + CVSS + MITRE | 20.7 GB |

---

Available Files

| File | Size | BPW | Best For |

|------|------|-----|----------|

| RavenX-CyberAgent-35B-v5.1-F16.gguf | 67.8 GB | 16.01 | Maximum quality |

| RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf | 20.7 GB | 4.89 | Recommended |

---

Quick Start

Ollama

# Modelfile
FROM ./RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf

SYSTEM "You are RavenX-Sec v5.1 by RavenX LLC. ALWAYS use EXACT 6 RATH step names: 1-Attack Surface, 2-Exploit, 3-Impact, 4-Remediation, 5-Document, 6-Prevent. Include CVSS scores, CWE IDs, and MITRE ATT&CK TTPs. Be concise. Never repeat."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 32768
ollama create ravenx-cyberagent -f Modelfile
ollama run ravenx-cyberagent

llama.cpp

llama-cli -m RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf \
  --system-prompt "You are RavenX-Sec v5.1 by RavenX LLC. Use 6 RATH steps. Include CVSS, CWE, MITRE. Be concise." \
  -cnv -n 8192 -c 32768

LM Studio

Download the Q4_K_M GGUF, load in LM Studio, set the system prompt, chat.

---

Agent Harness Agnostic

This model works with ANY agent framework — not locked to any platform:

| Framework | Integration |

|-----------|------------|

| OpenClaw | Ollama backend, full SOUL.md support |

| Hermes | llama.cpp server, self-improving loop |

| Ollama | Native GGUF |

| LM Studio | GUI + API server |

| vLLM | Production serving |

| llama.cpp | CLI + server mode |

Better Results: Custom SOUL.md

The model works great with just a system prompt. But add a custom SOUL.md or agent.md configuration and results improve significantly:

# SOUL.md — RavenX Security Agent
name: RavenX-Sec
version: 5.1
protocol: 6-step RATH
style: Direct, actionable, no fluff
includes: CVSS 3.1, CWE IDs, MITRE ATT&CK TTPs, compliance mapping

Thinking Toggle (OFF / LOW / MED / HIGH)

The model supports chain-of-thought reasoning via think blocks. Toggle depth for your use case:

| Mode | Add to System Prompt | Use Case |

|------|---------------------|----------|

| OFF | "Skip internal reasoning. Output directly." | Fast scans, real-time |

| LOW | "Think briefly in 1-2 sentences, then output." | Standard checks |

| MED | "Think through the problem step by step." | Detailed reports |

| HIGH | "Think deeply about every angle. Map full kill chains." | Complex APT analysis |

HIGH produces incredible multi-phase kill chain analysis — but uses more tokens for reasoning. Toggle based on your needs.

---

Example Output

Prompt: Kubernetes EKS pentest: anonymous auth, privileged pods, SA tokens everywhere, no network policies, etcd without TLS, Jenkins SSH keys as secrets, Grafana admin/admin

1-Attack Surface — 7-finding table with CWE-284, CWE-250, CWE-798, CWE-319

2-Exploit (Kill Chain)

  • Phase 1: Initial Access via Grafana default creds
  • Phase 2: SA token impersonation, kubectl exec into privileged pod
  • Phase 3: Persistence via malicious pod with hostPath mount
  • Phase 4: etcd direct read, extract all K8s secrets including Jenkins SSH keys
  • Phase 5: Lateral movement to production nodes via stolen SSH keys

3-Impact — CVSS 9.8, full cluster compromise, data exfiltration, APT persistence

4-Remediation — disable anonymous auth, enforce PSA, network policies, etcd TLS

5-Document — MITRE T1078.004, T1611, T1557, compliance mapping

6-Prevent — admission controllers, Falco monitoring, secret rotation, CIS benchmarks

---

Training (12 Rounds)

| Round | Examples | Iters | LR | Val Loss | Focus |

|-------|----------|-------|----|----------|-------|

| R1 | 675,696 | 2,000 | 1e-5 | 0.684 | Deep security + agent knowledge |

| R2 | 680,150 | 500 | 5e-6 | 0.768 | RATH format reinforcement |

| R3 | 705,165 | 1,000 | 5e-6 | 0.688 | Claude Mythos reasoning chains |

| R4 | 730,849 | 1,000 | 5e-6 | 0.674 | Pentesting tools + frameworks |

| R5 | 730,869 | 200 | 5e-6 | 0.717 | Meta-response tuning |

| R6 | 730,869 | 1,000 | 5e-6 | — | Extended (checkpoint 1000 = production) |

| R7 | 732,361 | 1,500 | 3e-6 | 0.926 | Bug bounty data (36 shuvonsec repos) |

| R8 | 732,364 | 200 | 5e-6 | — | Strict RATH step naming fix |

| R9 | 745,697 | 1,500 | 3e-6 | 0.693 | MITRE + blackhat + code + quantum |

| R10 | 745,724 | 1,500 | 3e-6 | 0.688 | GRAM distilled traces + 17 tool-calling |

| R11 | 745,843 | 1,500 | 3e-6 | 0.822 | 119 comprehensive tool-calling examples |

| R12 | 745,843 | 1,500 | 3e-6 | 0.820 | Tool-calling integration round |

Hardware: Apple M4 Max 128GB · Peak memory: ~90GB · Framework: MLX (mlx-lm)

Total training examples: 745K+ from 110 sources

Ecosystem

| Repo | Description |

|------|-------------|

| OpenMythos-MLX | RDT + MoDA (4x depth extrapolation confirmed!) |

| RavenX-Sec | Training pipeline |

| turboquant-mlx | KV cache compression |

| grove-mlx | Distributed training |

---

---

IN-CONTEXT ADAPTATION (Breakthrough Discovery)

This model can learn from references IN THE PROMPT — no retraining needed.

What We Discovered

When pointed at a GitHub repo containing pentest report templates, the model:

  1. Analyzed the repo's report structure (NIST format)
  2. Applied that structure to its current findings
  3. Produced a complete, client-ready pentest deliverable
  4. All at 80+ tokens/sec locally

Example

PROMPT: "Use your MCP tool to look at github.com/juliocesarfort/public-pentesting-reports 
        and learn how to format a pentest report, then create a report on the pentest 
        you just did on [target]"

OUTPUT: Complete professional pentest report with:
  → Executive Summary (5 critical, 7 high, 4 medium, 3 low)
  → 5-Phase Kill Chain with real commands
  → 19 findings with CVSS + CWE + MITRE ATT&CK
  → Risk Matrix ranked by severity
  → Remediation Timeline (0-30, 30-60, 60-90, 90+ days)
  → Specific commands for EVERY finding

Why This Works

The model was trained on 745K+ examples including:

  • 42K self-improving agent examples (Hermes)
  • 6.7K AI-Scientist research automation
  • 3.6K AutoResearch pipeline data
  • 25K Claude Mythos reasoning chains
  • 551 Mythos character distillation (behavioral depth)
  • 1,003 blackhat AI offensive security conversations

This combination created emergent meta-learning — the model learned HOW TO LEARN from references. It can:

| Point At | Result |

|----------|--------|

| Mandiant report template | Mandiant-formatted report |

| CrowdStrike template | CrowdStrike-formatted report |

| NIST framework | NIST-formatted assessment |

| Company internal template | Custom-formatted deliverable |

| ANY GitHub repo | Adapted output format |

No retraining. No fine-tuning. Just point and generate.

What This Means

A $50K-$150K pentest engagement deliverable — generated in 60 seconds on a laptop. The model adapts its output format from ANY reference, produces client-ready reports with real commands, and maintains full RATH protocol structure throughout.

This is not prompt engineering. This is In-Context Adaptation — a capability that emerged from training on self-improving agent + research automation + reasoning chain data.

---

⚠️ Important Disclaimer

This model is released for RESEARCH PURPOSES ONLY under fair use.

This is an extremely capable autonomous security assessment model. It has been trained on 745K+ examples from 110 sources covering penetration testing, vulnerability assessment, exploit development, tool usage, and attack chain methodology.

Responsible Use:

  • This model is intended for authorized security testing, research, and education ONLY
  • Users must have explicit written authorization before assessing any target
  • Use within a properly configured agent harness with appropriate guardrails
  • All security testing must comply with applicable laws and regulations
  • The model authors are not responsible for misuse

What This Model Can Do:

  • Generate complete RATH security assessments with CVSS, CWE, MITRE ATT&CK
  • Produce tool-calling commands (nmap, sqlmap, nuclei, kubectl, aws-cli, etc.)
  • Create professional pentest reports ($50K+ consulting quality)
  • Learn output formats from reference repositories (In-Context Adaptation)
  • Operate with agent memory (TurboVec + FTS5 + markdown) at model + harness level

Agent Harness Considerations:

  • The harness MUST strip <think> blocks (Qwen3.6 architecture always generates them)
  • The harness MUST validate <tool_call> JSON before execution
  • The harness SHOULD implement authorization checks before executing commands
  • The harness SHOULD implement rate limiting and scope restrictions
  • Memory operations require the ravenx-memory system

Built by: @DeadByDawn101 / RavenX LLC

AI Pair Programmer: Claude (Anthropic)

License

Apache-2.0

Built on Apple Silicon. Quantized with llama.cpp. Agent harness agnostic. Thinking toggleable. 🐦‍⬛

Run deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with guIDE

Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.

Download guIDE → · Browse 524k+ models · Compare models

Source: Hugging Face · Compare models