deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF overview
RavenX CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM 35B MoE 3B Active | Q4 K M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic…
Runs locally from ~20.22 GB disk (24 GB VRAM class GPUs with llama.cpp / guIDE).
Repository Files & Downloads
Model Details
| Model ID | deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF |
|---|---|
| Author | deadbydawn101 |
| Pipeline | text-generation |
| License | apache-2.0 |
| Base model | huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated |
| Last modified | 2026-06-08T06:35:58.000Z |
Model README
---
license: apache-2.0
base_model: huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated
tags:
- security
- cybersecurity
- pentest
- bug-bounty
- red-team
- agent
- tool-calling
- MCP
- GGUF
- llama-cpp
- ollama
- lm-studio
- vllm
- CVSS
- CWE
- MITRE-ATT&CK
- ravenx
- rath-protocol
- MoE
- 35B
- autonomous-agent
- abliterated
- qwen3.6
- openmythos
- quantized
language:
- en
pipeline_tag: text-generation
library_name: gguf
---
RavenX-CyberAgent GGUF — Ollama / LM Studio / llama.cpp / vLLM
35B MoE (3B Active) | Q4_K_M 20.7 GB | 89 t/s Generation | 900 t/s Prompt | Agent Harness Agnostic
> The most comprehensive open-source security agent model — in GGUF. Runs in Ollama, LM Studio, llama.cpp, vLLM, and any GGUF runtime. 51/51 LoRA tensors merged. Identical to the MLX version.
Built by @DeadByDawn101 | RavenX LLC
> "We don't give up. We do what others don't and build what isn't possible." — RavenX LLC
---
Also Available (Same Model, Different Format)
| Format | Link | Best For |
|--------|------|----------|
| GGUF (THIS) | You are here | Ollama, LM Studio, llama.cpp, vLLM, NVIDIA GPUs |
| MLX | RavenX-CyberAgent MLX | Apple Silicon native (M1-M4) |
Both versions are identical — same 51/51 LoRA tensors, same 745K+ training data, same 12 training rounds.
---
Benchmarks (M4 Max 128GB, llama.cpp b9501)
Prompt Processing: 900.6 tokens/sec
Generation: 89.3 tokens/sec
Model Size: 20.7 GB (Q4_K_M, 4.89 BPW)
Peak Memory: ~24 GB
Context Tested: 32K (262K native)
People are NOT getting the most out of local LLMs. A 35B MoE at Q4_K_M gives dramatically better output than a 7B model at the SAME speed — because only 3B params activate per token.
| Model | Speed | Quality | Size |
|-------|-------|---------|------|
| Llama 7B Q4 | ~30 t/s | Basic chat | 4 GB |
| Mistral 7B Q4 | ~50 t/s | Decent | 4 GB |
| RavenX 35B MoE Q4 | 89 t/s | Kill chains + CVSS + MITRE | 20.7 GB |
---
Available Files
| File | Size | BPW | Best For |
|------|------|-----|----------|
| RavenX-CyberAgent-35B-v5.1-F16.gguf | 67.8 GB | 16.01 | Maximum quality |
| RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf | 20.7 GB | 4.89 | Recommended |
---
Quick Start
Ollama
# Modelfile
FROM ./RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf
SYSTEM "You are RavenX-Sec v5.1 by RavenX LLC. ALWAYS use EXACT 6 RATH step names: 1-Attack Surface, 2-Exploit, 3-Impact, 4-Remediation, 5-Document, 6-Prevent. Include CVSS scores, CWE IDs, and MITRE ATT&CK TTPs. Be concise. Never repeat."
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 32768
ollama create ravenx-cyberagent -f Modelfile
ollama run ravenx-cyberagent
llama.cpp
llama-cli -m RavenX-CyberAgent-35B-v5.1-Q4_K_M.gguf \
--system-prompt "You are RavenX-Sec v5.1 by RavenX LLC. Use 6 RATH steps. Include CVSS, CWE, MITRE. Be concise." \
-cnv -n 8192 -c 32768
LM Studio
Download the Q4_K_M GGUF, load in LM Studio, set the system prompt, chat.
---
Agent Harness Agnostic
This model works with ANY agent framework — not locked to any platform:
| Framework | Integration |
|-----------|------------|
| OpenClaw | Ollama backend, full SOUL.md support |
| Hermes | llama.cpp server, self-improving loop |
| Ollama | Native GGUF |
| LM Studio | GUI + API server |
| vLLM | Production serving |
| llama.cpp | CLI + server mode |
Better Results: Custom SOUL.md
The model works great with just a system prompt. But add a custom SOUL.md or agent.md configuration and results improve significantly:
# SOUL.md — RavenX Security Agent
name: RavenX-Sec
version: 5.1
protocol: 6-step RATH
style: Direct, actionable, no fluff
includes: CVSS 3.1, CWE IDs, MITRE ATT&CK TTPs, compliance mapping
Thinking Toggle (OFF / LOW / MED / HIGH)
The model supports chain-of-thought reasoning via think blocks. Toggle depth for your use case:
| Mode | Add to System Prompt | Use Case |
|------|---------------------|----------|
| OFF | "Skip internal reasoning. Output directly." | Fast scans, real-time |
| LOW | "Think briefly in 1-2 sentences, then output." | Standard checks |
| MED | "Think through the problem step by step." | Detailed reports |
| HIGH | "Think deeply about every angle. Map full kill chains." | Complex APT analysis |
HIGH produces incredible multi-phase kill chain analysis — but uses more tokens for reasoning. Toggle based on your needs.
---
Example Output
Prompt: Kubernetes EKS pentest: anonymous auth, privileged pods, SA tokens everywhere, no network policies, etcd without TLS, Jenkins SSH keys as secrets, Grafana admin/admin
1-Attack Surface — 7-finding table with CWE-284, CWE-250, CWE-798, CWE-319
2-Exploit (Kill Chain)
- Phase 1: Initial Access via Grafana default creds
- Phase 2: SA token impersonation, kubectl exec into privileged pod
- Phase 3: Persistence via malicious pod with hostPath mount
- Phase 4: etcd direct read, extract all K8s secrets including Jenkins SSH keys
- Phase 5: Lateral movement to production nodes via stolen SSH keys
3-Impact — CVSS 9.8, full cluster compromise, data exfiltration, APT persistence
4-Remediation — disable anonymous auth, enforce PSA, network policies, etcd TLS
5-Document — MITRE T1078.004, T1611, T1557, compliance mapping
6-Prevent — admission controllers, Falco monitoring, secret rotation, CIS benchmarks
---
Training (12 Rounds)
| Round | Examples | Iters | LR | Val Loss | Focus |
|-------|----------|-------|----|----------|-------|
| R1 | 675,696 | 2,000 | 1e-5 | 0.684 | Deep security + agent knowledge |
| R2 | 680,150 | 500 | 5e-6 | 0.768 | RATH format reinforcement |
| R3 | 705,165 | 1,000 | 5e-6 | 0.688 | Claude Mythos reasoning chains |
| R4 | 730,849 | 1,000 | 5e-6 | 0.674 | Pentesting tools + frameworks |
| R5 | 730,869 | 200 | 5e-6 | 0.717 | Meta-response tuning |
| R6 | 730,869 | 1,000 | 5e-6 | — | Extended (checkpoint 1000 = production) |
| R7 | 732,361 | 1,500 | 3e-6 | 0.926 | Bug bounty data (36 shuvonsec repos) |
| R8 | 732,364 | 200 | 5e-6 | — | Strict RATH step naming fix |
| R9 | 745,697 | 1,500 | 3e-6 | 0.693 | MITRE + blackhat + code + quantum |
| R10 | 745,724 | 1,500 | 3e-6 | 0.688 | GRAM distilled traces + 17 tool-calling |
| R11 | 745,843 | 1,500 | 3e-6 | 0.822 | 119 comprehensive tool-calling examples |
| R12 | 745,843 | 1,500 | 3e-6 | 0.820 | Tool-calling integration round |
Hardware: Apple M4 Max 128GB · Peak memory: ~90GB · Framework: MLX (mlx-lm)
Total training examples: 745K+ from 110 sources
Ecosystem
| Repo | Description |
|------|-------------|
| OpenMythos-MLX | RDT + MoDA (4x depth extrapolation confirmed!) |
| RavenX-Sec | Training pipeline |
| turboquant-mlx | KV cache compression |
| grove-mlx | Distributed training |
---
---
IN-CONTEXT ADAPTATION (Breakthrough Discovery)
This model can learn from references IN THE PROMPT — no retraining needed.
What We Discovered
When pointed at a GitHub repo containing pentest report templates, the model:
- Analyzed the repo's report structure (NIST format)
- Applied that structure to its current findings
- Produced a complete, client-ready pentest deliverable
- All at 80+ tokens/sec locally
Example
PROMPT: "Use your MCP tool to look at github.com/juliocesarfort/public-pentesting-reports
and learn how to format a pentest report, then create a report on the pentest
you just did on [target]"
OUTPUT: Complete professional pentest report with:
→ Executive Summary (5 critical, 7 high, 4 medium, 3 low)
→ 5-Phase Kill Chain with real commands
→ 19 findings with CVSS + CWE + MITRE ATT&CK
→ Risk Matrix ranked by severity
→ Remediation Timeline (0-30, 30-60, 60-90, 90+ days)
→ Specific commands for EVERY finding
Why This Works
The model was trained on 745K+ examples including:
- 42K self-improving agent examples (Hermes)
- 6.7K AI-Scientist research automation
- 3.6K AutoResearch pipeline data
- 25K Claude Mythos reasoning chains
- 551 Mythos character distillation (behavioral depth)
- 1,003 blackhat AI offensive security conversations
This combination created emergent meta-learning — the model learned HOW TO LEARN from references. It can:
| Point At | Result |
|----------|--------|
| Mandiant report template | Mandiant-formatted report |
| CrowdStrike template | CrowdStrike-formatted report |
| NIST framework | NIST-formatted assessment |
| Company internal template | Custom-formatted deliverable |
| ANY GitHub repo | Adapted output format |
No retraining. No fine-tuning. Just point and generate.
What This Means
A $50K-$150K pentest engagement deliverable — generated in 60 seconds on a laptop. The model adapts its output format from ANY reference, produces client-ready reports with real commands, and maintains full RATH protocol structure throughout.
This is not prompt engineering. This is In-Context Adaptation — a capability that emerged from training on self-improving agent + research automation + reasoning chain data.
---
⚠️ Important Disclaimer
This model is released for RESEARCH PURPOSES ONLY under fair use.
This is an extremely capable autonomous security assessment model. It has been trained on 745K+ examples from 110 sources covering penetration testing, vulnerability assessment, exploit development, tool usage, and attack chain methodology.
Responsible Use:
- This model is intended for authorized security testing, research, and education ONLY
- Users must have explicit written authorization before assessing any target
- Use within a properly configured agent harness with appropriate guardrails
- All security testing must comply with applicable laws and regulations
- The model authors are not responsible for misuse
What This Model Can Do:
- Generate complete RATH security assessments with CVSS, CWE, MITRE ATT&CK
- Produce tool-calling commands (nmap, sqlmap, nuclei, kubectl, aws-cli, etc.)
- Create professional pentest reports ($50K+ consulting quality)
- Learn output formats from reference repositories (In-Context Adaptation)
- Operate with agent memory (TurboVec + FTS5 + markdown) at model + harness level
Agent Harness Considerations:
- The harness MUST strip
<think>blocks (Qwen3.6 architecture always generates them) - The harness MUST validate
<tool_call>JSON before execution - The harness SHOULD implement authorization checks before executing commands
- The harness SHOULD implement rate limiting and scope restrictions
- Memory operations require the ravenx-memory system
Built by: @DeadByDawn101 / RavenX LLC
AI Pair Programmer: Claude (Anthropic)
License
Apache-2.0
Built on Apple Silicon. Quantized with llama.cpp. Agent harness agnostic. Thinking toggleable. 🐦⬛
Run deadbydawn101/RavenX-CyberAgent-Qwen3.6-35B-A3B-Opus-4.7-OpenMythos-Pentester-BugHunter-RATH-GGUF with guIDE
Download guIDE — the AI-native code editor with local LLM inference and 69 built-in tools.
Source: Hugging Face · Compare models