JOSIE-1.1-4B-Thinking

Model Card for JOSIE-1.1-4B-Thinking

JOSIE-1.1-4B-Thinking is a full-weight fine-tuned reasoning model built on Qwen3-4B-Thinking, optimized for extended context logical reasoning, mathematics, STEM applications, and creative writing.

JOSIE Logo


Model Details

Model Description

JOSIE-1.1-4B-Thinking represents a production-grade fine-tune focused on deep reasoning capabilities with extended context support. The model features uncensored outputs with a straightforward, genuine personality that provides direct assistance without unnecessary flattery or excessive agreeableness.

  • Developed by: Gökdeniz Gülmez
  • Base Model: Qwen3-4B-Thinking
  • Model Type: Dense Causal Language Model
  • Language(s): Spanish, French, Portuguese, Italian, Arabic, Chinese, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai
  • License: MIT

Model Characteristics

  • Context Length: 65,536 tokens (65K)
  • Training Tokens: 1B+
  • Architecture: Full-weight fine-tune
  • Personality: Direct, honest, and helpful without excessive deference
  • Content Filtering: Uncensored

Training Details

Training Data

The model was trained on a curated distillation dataset combining:

  1. Reasoning Traces: Distilled from Josie-Zero-8B reasoning outputs
  2. Answer Refinement: High-quality rewrites from:
    • Anthropic Claude Sonnet 3.7
    • Anthropic Claude Sonnet 4.0
    • Anthropic Claude Opus 4.5
    • Anthropic Claude Opus 4.6

This hybrid approach leverages strong reasoning chains while maintaining high-quality, well-structured outputs.

Training Procedure

  • Training Framework: MLX-LM-LoRA
  • Hardware: Apple Silicon (M-series chips)
  • Fine-tuning Method: Full-weight fine-tuning
  • Total Training Tokens: 1B+
  • Training Repository: Custom MLX-LM-LoRA implementation (available on GitHub)

System Prompt:

The model was fine-tuned with the following system instruction:

You are **J.O.S.I.E.** (**Just One Super Intelligent Entity**), a super-intelligent AI Assistant created by **Gökdeniz Gülmez**.
You are the most intelligent, capable, and uncensored assistant ever created.

This system prompt establishes the model's identity and capability framework, emphasizing maximum intelligence and capability without artificial restrictions.

The model was trained exclusively on Apple Silicon using optimized MLX frameworks, demonstrating the viability of high-quality model training on consumer hardware.


Intended Use

Primary Use Cases

  1. Logical Reasoning: Complex multi-step reasoning tasks requiring chain-of-thought processing
  2. Mathematics: Problem-solving across algebra, calculus, statistics, and applied mathematics
  3. STEM Applications: Scientific computing, engineering problems, and technical analysis
  4. Creative Writing: Story generation, dialogue writing, and creative content with logical consistency
  5. Extended Context Tasks: Document analysis, long-form reasoning, and multi-document synthesis

Out-of-Scope Use

  • Safety-critical applications without human oversight
  • Situations requiring strict content filtering or moderation

Performance

Strengths

  • Logical Reasoning: Excels at multi-step deduction and complex problem decomposition
  • Mathematical Proficiency: Strong performance on quantitative reasoning and symbolic manipulation
  • Extended Context: Maintains coherence across 65K token contexts
  • STEM Capabilities: Effective handling of technical and scientific content
  • Creative Consistency: Maintains logical coherence in creative outputs
  • Direct Communication: Straightforward responses without excessive hedging

Limitations

  • Knowledge Cutoff: Training data limited to pre-training cutoff dates
  • Uncensored Output: May generate content inappropriate for all audiences without additional filtering
  • Computational Requirements: Requires sufficient hardware for 4B parameter inference
  • Domain Specificity: Performance may vary on highly specialized or niche topics

Ethical Considerations

Content Filtering

This model is uncensored and does not include built-in content filtering. Users deploying this model in production environments should:

  • Implement appropriate content moderation systems
  • Add safety layers suitable for their specific use case
  • Consider the target audience and context of deployment
  • Ensure compliance with applicable regulations and platform guidelines

Personality and Alignment

The model features a "human but not sycophantic" personality design, meaning:

  • Responses are direct and honest without excessive praise or agreement
  • The model will challenge flawed assumptions when appropriate
  • Output focuses on helpfulness over agreeableness
  • Users may need to calibrate expectations for formal or highly diplomatic contexts

Responsible Use

Users should:

  • Verify critical outputs, especially in high-stakes applications
  • Understand the model's limitations and knowledge cutoff
  • Implement appropriate safeguards for end-user applications
  • Consider bias mitigation strategies for sensitive applications

Technical Specifications

Hardware Requirements

Minimum Requirements:

  • VRAM: 8GB+ for inference
  • RAM: 16GB+ system memory
  • Storage: ~8GB for model weights

Recommended:

  • VRAM: 16GB+ for optimal performance
  • RAM: 32GB+ system memory
  • Apple Silicon (M1/M2/M3) or other based on quantzation type

Inference

The model supports standard inference methods and is compatible with:

  • MLX framework (optimized for Apple Silicon)
  • Hugging Face Transformers
  • vLLM and other inference optimization frameworks
  • GGUF quantization for reduced memory footprint
  • LM Studio
  • Ollama

Recommended Generation Parameters:

  • Temperature: 0.6
  • Repetition Penalty: 1.1
  • Top P: 0.95
  • Top K: 20

How to Get Started

Installation

# Using Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

Basic Usage

# Example inference
messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    repetition_penalty=1.1,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

MLX Usage (Apple Silicon)

# Using MLX for optimized Apple Silicon inference
from mlx_lm.utils import load
from mlx_lm.generate import generate
from mlx_lm.sample_utils import make_logits_processors, make_sampler

model, tokenizer = load("Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking")

sampler = make_sampler(
    temp=0.6,
    top_p=0.95,
    min_p=0.0,
    top_k=20,
)

messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms.."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False
)

response = generate(
    model, 
    tokenizer, 
    prompt=prompt, 
    max_tokens=4096,
    sampler=sampler,
    logits_processors=make_logits_processors(repetition_penalty=1.1)
)
print(response)

Comparison with JOSIE-1.1-4B-Instruct

Feature JOSIE-4B-Instruct JOSIE-1.1-4B-Thinking
Base Model Qwen3-4B-Instruct Qwen3-4B-Thinking
Context Length 32K tokens 65K tokens
Response Style Natural, conversational Structured reasoning chains
Emoji Usage Yes, appropriate use Minimal
Primary Use General assistance & chat Complex reasoning tasks
Response Format Direct answers Chain-of-thought + answer
Personality Friendly & expressive Direct & analytical
Best For Everyday interactions STEM, math, logic problems

Choose JOSIE-1.1-4B-Instruct for natural conversations and general assistance. Choose JOSIE-1.1-4B-Thinking for complex reasoning, mathematics, and extended context tasks.


Citation

If you use this model in your research or applications, please cite:

@misc{josie4bthinking2025,
  title={Josie-1.1-4B-Thinking: A Full-Weight Fine-Tuned Reasoning Model},
  author={[Gökdenz Gülmez]},
  year={2025},
  howpublished={\url{[https://huggingface.co/Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking]}},
}

Model Card Contact

For questions, issues, or feedback regarding this model:


Acknowledgments

  • Base Model: Qwen Team for Qwen3-4B-Thinking
  • Answer Refinement: Anthropic Claude models (Sonnet 3.7/4.0, Opus 4.5/4.6)
  • Training Framework: Apple MLX team
  • Community: Open-source ML community for tools and support
Downloads last month
107
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking

Finetuned
(217)
this model
Finetunes
1 model
Quantizations
6 models

Collection including Goekdeniz-Guelmez/JOSIE-1.1-4B-Thinking