# ULTRATHINK

ULTRATHINK Logo

🚀 Production-ready training framework for advanced Large Language Models

Quick Start • Features • Documentation • Benchmarks • Comparisons • Roadmap • Contributing

--- ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring. ## 🎯 Why ULTRATHINK? **Train state-of-the-art LLMs in 10 lines of code** - From prototype to production in minutes, not days. ```bash python train_ultrathink.py \ --dataset c4 --streaming \ --hidden_size 768 --num_layers 12 \ --enable_moe --enable_dre \ --use_amp --gradient_checkpointing ``` ### 🏆 What Makes Us Different | Feature | ULTRATHINK | Others | |---------|-----------|--------| | **Setup Time** | ⚡ 5 minutes | 30-120 minutes | | **Lines to Train** | 📝 ~10 | 50-100+ | | **MoE Support** | ✅ Native | ❌ or Limited | | **Dynamic Reasoning** | ✅ Unique | ❌ None | | **Constitutional AI** | ✅ Built-in | ❌ None | | **Documentation** | 📚 Comprehensive | Varies | **[See detailed comparison →](docs/COMPARISON.md)** ## ✨ Key Features - 🏗️ **Modern Architecture** - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm - 🧠 **Advanced Components** - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI - 📊 **Production Monitoring** - MLflow, W&B, TensorBoard integration - ⚡ **Optimized Training** - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP - 🧪 **Fully Tested** - Unit & integration tests with pytest - 🐳 **Docker Support** - Ready-to-use containers for training and inference - 📚 **Complete Docs** - Step-by-step guides for all experience levels **[View benchmarks and performance metrics →](docs/BENCHMARKS.md)** ## 🚀 Quick Start ### Installation ```bash # Clone repository git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git cd UltraThinking-LLM-Training/deep # Install dependencies pip install -r requirements.txt ``` ### Training Examples **Tiny Model (CPU-friendly, for testing):** ```bash python train_ultrathink.py \ --dataset wikitext \ --hidden_size 256 --num_layers 2 --num_heads 4 \ --batch_size 2 --max_samples 1000 \ --num_epochs 1 ``` **Small Model (GPU recommended):** ```bash python train_advanced.py --config configs/train_small.yaml ``` **With Advanced Features:** ```bash python train_ultrathink.py \ --dataset c4 --streaming \ --hidden_size 768 --num_layers 12 --num_heads 12 \ --enable_moe --enable_dre --enable_constitutional \ --use_amp --gradient_checkpointing \ --use_mlflow ``` ### Docker ```bash # Run Gradio web interface docker compose up # Or build and run manually docker build -t ultrathink:latest . docker run -p 7860:7860 ultrathink:latest ``` ### Testing ```bash # Run all tests pytest # Run with coverage pytest --cov=src --cov-report=html # Quick smoke test python tests/smoke_test.py ``` ## 📚 Documentation ### 🚀 Getting Started - **[Training Quickstart](docs/TRAINING_QUICKSTART.md)** - Get started in 5 minutes - **[Advanced Training Guide](ADVANCED_TRAINING_GUIDE.md)** - Deep dive into all features - **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions - **[Google Colab](docs/colab.md)** - Train in the cloud for free ### 📊 Performance & Comparisons - **[Benchmarks](docs/BENCHMARKS.md)** - Performance metrics and results - **[Framework Comparison](docs/COMPARISON.md)** - vs GPT-NeoX, Megatron-LM, Axolotl - **[Model Card](docs/MODEL_CARD.md)** - Model specifications ### 🏗️ Architecture & Development - **[Architecture Overview](ARCHITECTURE_OVERVIEW.md)** - Visual system diagrams - **[Project Structure](docs/PROJECT_STRUCTURE.md)** - Understanding the codebase - **[Roadmap](docs/ROADMAP.md)** - Future plans and features ### 📖 Training Guides - [Small Models](docs/training_small.md) - Train on limited hardware - [DeepSpeed Integration](docs/training_deepspeed.md) - Distributed training setup - [Dataset Configuration](docs/datasets.md) - Using custom datasets ### 🤝 Community - **[Contributing](CONTRIBUTING.md)** - Contribution guidelines - **[Code of Conduct](CODE_OF_CONDUCT.md)** - Community standards - **[Changelog](CHANGELOG.md)** - Version history **[📖 Full Documentation Index](docs/README.md)** ## 📁 Project Structure ``` deep/ ├── train_ultrathink.py # Main training script ├── train_advanced.py # YAML config-based training ├── app_gradio.py # Web UI for inference ├── src/ │ ├── models/ # UltraThink, MoE, DRE, architecture │ ├── data/ # Datasets, tokenization, validation │ ├── training/ # Optimizers, distributed, RLHF │ ├── monitoring/ # Metrics and system monitoring │ ├── security/ # Input validation and safety │ └── evaluation/ # Benchmarks and metrics ├── tests/ # Unit and integration tests ├── configs/ # YAML configuration files ├── scripts/ # Utilities (profiling, inference) └── docs/ # Documentation and guides ``` See **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** for detailed explanations. ## 🔥 Training Examples ### Small Dataset Training ```bash # WikiText-2 (fast iteration) python train_ultrathink.py \ --dataset wikitext \ --hidden_size 512 --num_layers 6 --num_heads 8 \ --batch_size 4 --num_epochs 3 \ --use_mlflow ``` ### Production Training (C4 Dataset) ```bash # Streaming C4 with all optimizations python train_ultrathink.py \ --dataset c4 --dataset_subset en --streaming \ --hidden_size 768 --num_layers 12 --num_heads 12 \ --batch_size 2 --gradient_accumulation_steps 64 \ --learning_rate 3e-4 --warmup_steps 5000 \ --use_amp --gradient_checkpointing \ --max_seq_length 1024 \ --output_dir ./outputs/c4_production ``` ### Using Configuration Files ```bash # Small model (4-8GB GPU) python train_advanced.py --config configs/train_small.yaml # Medium model (16-32GB GPU) python train_advanced.py --config configs/train_medium.yaml # Large model (40GB+ GPU) python train_advanced.py --config configs/train_large.yaml ``` ## 🐳 Docker Usage **Web Interface (Gradio):** ```bash docker compose up # Visit http://localhost:7860 ``` **Custom Training:** ```bash docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \ python train_ultrathink.py \ --dataset wikitext \ --hidden_size 256 --num_layers 2 \ --output_dir /app/outputs/my_model ``` **GPU Training:** ```bash docker run --gpus all \ -v $(pwd)/outputs:/app/outputs \ ultrathink:latest \ python train_ultrathink.py --use_amp ``` ## 🤝 Contributing We welcome contributions! Please see: - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Guidelines and setup - **[CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)** - Community standards - **[Roadmap](docs/ROADMAP.md)** - See what we're building next ### 🌟 Star History If you find ULTRATHINK useful, please consider giving us a star! ⭐ [![Star History Chart](https://api.star-history.com/svg?repos=vediyappanm/UltraThinking-LLM-Training&type=Date)](https://star-history.com/#vediyappanm/UltraThinking-LLM-Training&Date) ## 📊 Model Specifications | Size | Parameters | Layers | Hidden | Context | Min GPU | |------|-----------|--------|--------|---------|---------| | Tiny | 125M | 12 | 768 | 2048 | 6GB | | Small | 350M | 24 | 1024 | 4096 | 16GB | | Medium | 760M | 24 | 1536 | 4096 | 24GB | | Large | 1.3B | 32 | 2048 | 8192 | 40GB | See **[MODEL_CARD.md](MODEL_CARD.md)** for complete specifications. ## 📄 License MIT License - see [LICENSE](LICENSE) for details. ## 🙏 Citation If you use ULTRATHINK in your research or project, please cite: ```bibtex @software{ultrathink2025, title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning}, author={ULTRATHINK Team}, year={2025}, url={https://github.com/vediyappanm/UltraThinking-LLM-Training}, version={1.0.0} } ``` ## 🌐 Community & Support

### 💬 Get Help - **[GitHub Discussions](https://github.com/vediyappanm/UltraThinking-LLM-Training/discussions)** - Ask questions, share ideas - **[Issue Tracker](https://github.com/vediyappanm/UltraThinking-LLM-Training/issues)** - Report bugs, request features - **[Troubleshooting Guide](docs/TROUBLESHOOTING.md)** - Common issues and solutions - **[FAQ](docs/faq.md)** - Frequently asked questions ### 🚀 Share Your Work Built something cool with ULTRATHINK? We'd love to hear about it! - Open a discussion to share your project - Submit a PR to add your model to our showcase - Tweet about it and tag us ### 📢 Stay Updated - ⭐ **Star this repo** to get notifications - 👀 **Watch releases** for new features - 🐦 **Follow on Twitter** for updates ---

Made with ❤️ by the ULTRATHINK Team