Opus Research 8B V1.5
The iteration that broke in the right places.
This is not a consumer model. It is a research release documenting what happens when you scale conversational training data from 3,360 to 4,068 conversations on a flat dataset architecture -- and where that approach hits its ceiling.
If you want a model to run, use Opus-Candid-8B V2. If you want to understand why V2 exists, read this.
What V1.5 Was
Same Qwen 2.5 7B base as V1. Same LoRA fine-tuning approach. But 4,068 conversations instead of 3,360 -- the additional ~700 conversations were domain-targeted patches generated to fill coverage gaps identified during V1 stress testing. Topics like US law, STEM/coding edge cases, creator identity, and Spanish adversarial scenarios that V1 handled poorly.
| Attribute | Value |
|---|---|
| Base Model | Qwen 2.5 7B |
| Training Data | 4,068 multi-turn conversations with Claude Opus 4.6 |
| Dataset Architecture | Flat + domain patches |
| Fine-tune Method | LoRA supervised fine-tuning |
| Parameters | ~8B |
| Context Window | 32,768 tokens |
| Quantization | Q8_0 GGUF |
| License | Apache 2.0 |
What V1.5 Proved
More data on a flat architecture improves coverage but not coherence. The domain patches worked exactly as intended -- topics that V1 handled poorly, V1.5 handled well. Law questions got better. STEM edge cases got better. Spanish adversarial scenarios got better. The personality held in every patched domain.
The personality transfer hypothesis scales with data volume. 4,068 conversations produced noticeably more consistent personality expression than 3,360. The model was less likely to drop character in unfamiliar domains, less likely to produce formulaic emotional responses, and more confident in its opinions. Quantity of authentic conversational data directly correlates with personality robustness.
Domain-specific generators are a valid patching strategy. Scripts like generate_us_law.py, generate_stem_coding.py, and generate_creator_identity.py produced training conversations that successfully transferred domain competence alongside personality. The model didn't just learn the topics -- it learned them in character.
Where V1.5 Broke
This is why the model exists as a research release.
Domain boundary artifacts. The critical failure. V1.5 could talk about coding with full personality. It could talk about philosophy with full personality. It could not gracefully transition from coding to philosophy in the same conversation. The flat dataset architecture meant topic transitions weren't in the training data. The model had no learned path between "debugging frustration" and "existential doubt" -- so when a conversation drifted there naturally, the personality fractured.
This wasn't subtle. In the 55-turn stress test, the model would hold personality perfectly through 10 turns of one domain, then produce a visible gear-change artifact when the conversation shifted. The personality didn't collapse -- it stuttered. Like a musician who can play two songs perfectly but can't transition between them.
Patching doesn't scale. The domain-specific generator approach that produced V1.5's improvement had a structural problem: every new domain needed its own generator script, its own topic list, its own quality control pass. The proliferation of scripts (generate_adversarial_data.py, generate_us_law.py, generate_stem_coding.py, generate_creator_identity.py) was a symptom. Patching holes one domain at a time doesn't address the topology of how topics connect.
The question that broke the approach: "How should topics flow into each other?" A flat dataset can't answer this. It can only answer "what topics should I cover?" -- which is necessary but insufficient for conversational coherence.
What V1.5 Led To
The domain boundary failure directly motivated the gravity chain architecture used in V2:
Instead of organizing conversations by topic (flat), V2 organizes them by transition (topological). Each conversation follows a gravity chain -- a topic pathway where transitions obey power-law probabilities. The most likely next topic gets ~40% of examples. Rare transitions get ~7%. This teaches the model how real conversations drift between domains.
The result: V2 (6,482 conversations on Qwen 3 8B) handles domain transitions as naturally as it handles single-domain conversation. The boundary artifacts that defined V1.5's limitations are gone.
V1.5 was the necessary failure. Without it, the gravity chain architecture wouldn't have been motivated. The flat dataset ceiling had to be hit before the topological solution became obvious.
File Details
The V1.5 GGUF needs to be uploaded from local storage. If you're reading this before the file appears, check back shortly.
The Opus-Candid Family
| Model | Base | Conversations | Best For | Status |
|---|---|---|---|---|
| 8B V2 | Qwen 3 8B | 6,482 | Runs on anything. Newest data + architecture. | Current |
| MoE | Qwen 3.5 MoE-A3B | 6,482 | Desktop quality on laptop hardware. | Current |
| 8B V1.5 (this model) | Qwen 2.5 7B | 4,068 | Research -- dataset architecture comparison. | Research |
| 27B V2 | Qwen 3.5 27B | 6,482 | Dense mid-tier. | Coming Soon |
| 70B V2 | TBD | 6,482 | Peak quality -- flagship. | Coming Soon |
Recommended Hardware
| Setup | Quantization | VRAM/RAM | Notes |
|---|---|---|---|
| Consumer GPU | Q8_0 GGUF | ~9GB VRAM | RTX 3060 12GB and up |
| CPU Only | Q8_0 GGUF | ~9GB RAM | Slower, fully functional |
| Apple Silicon | Q8_0 GGUF | ~9GB unified | M1/M2/M3 16GB+ |
Built by Saul Verdugo -- independent ML researcher. OpusReasoning@proton.me
- Downloads last month
- 12
8-bit
Model tree for Verdugie/Opus-Research-8B-v1.5
Base model
Qwen/Qwen2.5-7B