JOSIE-1.1-4B-Thinking

Model Card for JOSIE-1.1-4B-Thinking

JOSIE-1.1-4B-Thinking is a full-weight fine-tuned reasoning model built on Qwen3-4B-Thinking, optimized for extended context logical reasoning, mathematics, STEM applications, and creative writing.

JOSIE Logo

Model Details

Model Description

JOSIE-1.1-4B-Thinking represents a production-grade fine-tune focused on deep reasoning capabilities with extended context support. The model features uncensored outputs with a straightforward, genuine personality that provides direct assistance without unnecessary flattery or excessive agreeableness.

Developed by: Gökdeniz Gülmez
Base Model: Qwen3-4B-Thinking
Model Type: Dense Causal Language Model
Language(s): Spanish, French, Portuguese, Italian, Arabic, Chinese, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
License: MIT

Model Characteristics

Context Length: 65,536 tokens (65K)
Training Tokens: 1B+
Architecture: Full-weight fine-tune
Personality: Direct, honest, and helpful without excessive deference
Content Filtering: Uncensored

Training Details

Training Data

The model was trained on a curated distillation dataset combining:

Reasoning Traces: Distilled from Josie-Zero-8B reasoning outputs
Answer Refinement: High-quality rewrites from:
- Anthropic Claude Sonnet 3.7
- Anthropic Claude Sonnet 4.0
- Anthropic Claude Opus 4.5
- Anthropic Claude Opus 4.6

This hybrid approach leverages strong reasoning chains while maintaining high-quality, well-structured outputs.

Training Procedure

Training Framework: MLX-LM-LoRA
Hardware: Apple Silicon (M-series chips)
Fine-tuning Method: Full-weight fine-tuning
Total Training Tokens: 1B+
Training Repository: Custom MLX-LM-LoRA implementation (available on GitHub)

System Prompt:

The model was fine-tuned with the following system instruction:

You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.

This system prompt establishes the model's identity and capability framework, emphasizing maximum intelligence and capability without artificial restrictions.

The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.

Intended Use

Primary Use Cases

Logical Reasoning: Complex multi-step reasoning tasks requiring chain-of-thought processing
Mathematics: Problem-solving across algebra, calculus, statistics, and applied mathematics
STEM Applications: Scientific computing, engineering problems, and technical analysis
Creative Writing: Story generation, dialogue writing, and creative content with logical consistency
Extended Context Tasks: Document analysis, long-form reasoning, and multi-document synthesis

Out-of-Scope Use

Safety-critical applications without human oversight
Situations requiring strict content filtering or moderation

Performance

Strengths

Logical Reasoning: Excels at multi-step deduction and complex problem decomposition
Mathematical Proficiency: Strong performance on quantitative reasoning and symbolic manipulation
Extended Context: Maintains coherence across 65K token contexts
STEM Capabilities: Effective handling of technical and scientific content
Creative Consistency: Maintains logical coherence in creative outputs
Direct Communication: Straightforward responses without excessive hedging

Limitations

Knowledge Cutoff: Training data limited to pre-training cutoff dates
Uncensored Output: May generate content inappropriate for all audiences without additional filtering
Computational Requirements: Requires sufficient hardware for 4B parameter inference
Domain Specificity: Performance may vary on highly specialized or niche topics

Ethical Considerations

Content Filtering

This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:

Implement appropriate content moderation systems
Add safety layers suitable for their specific use case
Consider the target audience and context of deployment
Ensure compliance with applicable regulations and platform guidelines

Personality and Alignment

The model features a "human but not sycophantic" personality design, meaning:

Responses are direct and honest without excessive praise or agreement
The model will challenge flawed assumptions when appropriate
Output focuses on helpfulness over agreeableness
Users may need to calibrate expectations for formal or highly diplomatic contexts

Responsible Use

Users should:

Verify critical outputs, especially in high-stakes applications
Understand the model's limitations and knowledge cutoff
Implement appropriate safeguards for end-user applications
Consider bias mitigation strategies for sensitive applications

Technical Specifications

Hardware Requirements

Minimum Requirements:

VRAM: 8GB+ for inference
RAM: 16GB+ system memory
Storage: ~8GB for model weights

Recommended:

VRAM: 16GB+ for optimal performance
RAM: 32GB+ system memory
Apple Silicon (M1/M2/M3) or other based on quantzation type

Inference

The model supports standard inference methods and is compatible with:

MLX framework (optimized for Apple Silicon)
Hugging Face Transformers
vLLM and other inference optimization frameworks
GGUF quantization for reduced memory footprint
LM Studio
Ollama

Recommended Generation Parameters:

Temperature: 0.6
Repetition Penalty: 1.1
Top P: 0.95
Top K: 20

How to Get Started

Installation

# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Basic Usage

# Example inference
messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    repetition_penalty=1.1,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

MLX Usage (Apple Silicon)

# Using MLX for optimized Apple Silicon inference
from mlx_lm.utils import load
from mlx_lm.generate import generate
from mlx_lm.sample_utils import make_logits_processors, make_sampler

model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking")

sampler = make_sampler(
    temp=0.6,
    top_p=0.95,
    min_p=0.0,
    top_k=20,
)

messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False
)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=4096,
    sampler=sampler,
    logits_processors=make_logits_processors(repetition_penalty=1.1)
)
print(response)

Comparison with JOSIE-1.1-4B-Instruct

Feature	JOSIE-4B-Instruct	JOSIE-1.1-4B-Thinking
Base Model	Qwen3-4B-Instruct	Qwen3-4B-Thinking
Context Length	32K tokens	65K tokens
Response Style	Natural, conversational	Structured reasoning chains
Emoji Usage	Yes, appropriate use	Minimal
Primary Use	General assistance & chat	Complex reasoning tasks
Response Format	Direct answers	Chain-of-thought + answer
Personality	Friendly & expressive	Direct & analytical
Best For	Everyday interactions	STEM, math, logic problems

Choose JOSIE-1.1-4B-Instruct for natural conversations and general assistance. Choose JOSIE-1.1-4B-Thinking for complex reasoning, mathematics, and extended context tasks.

Citation

If you use this model in your research or applications, please cite:

@misc{josie4bthinking2025,
  title={Josie-1.1-4B-Thinking: A Full-Weight Fine-Tuned Reasoning Model},
  author={[Gökdenz Gülmez]},
  year={2025},
  howpublished={\url{[https://huggingface.co/Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking]}},
}