# ULTRATHINK
๐ Production-ready training framework for advanced Large Language Models
Quick Start โข Features โข Documentation โข Benchmarks โข Comparisons โข Roadmap โข Contributing
--- ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring. ## ๐ฏ Why ULTRATHINK? **Train state-of-the-art LLMs in 10 lines of code** - From prototype to production in minutes, not days. ```bash python train_ultrathink.py \ --dataset c4 --streaming \ --hidden_size 768 --num_layers 12 \ --enable_moe --enable_dre \ --use_amp --gradient_checkpointing ``` ### ๐ What Makes Us Different | Feature | ULTRATHINK | Others | |---------|-----------|--------| | **Setup Time** | โก 5 minutes | 30-120 minutes | | **Lines to Train** | ๐ ~10 | 50-100+ | | **MoE Support** | โ Native | โ or Limited | | **Dynamic Reasoning** | โ Unique | โ None | | **Constitutional AI** | โ Built-in | โ None | | **Documentation** | ๐ Comprehensive | Varies | **[See detailed comparison โ](docs/COMPARISON.md)** ## โจ Key Features - ๐๏ธ **Modern Architecture** - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm - ๐ง **Advanced Components** - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI - ๐ **Production Monitoring** - MLflow, W&B, TensorBoard integration - โก **Optimized Training** - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP - ๐งช **Fully Tested** - Unit & integration tests with pytest - ๐ณ **Docker Support** - Ready-to-use containers for training and inference - ๐ **Complete Docs** - Step-by-step guides for all experience levels **[View benchmarks and performance metrics โ](docs/BENCHMARKS.md)** ## ๐ Quick Start ### Installation ```bash # Clone repository git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git cd UltraThinking-LLM-Training/deep # Install dependencies pip install -r requirements.txt ``` ### Training Examples **Tiny Model (CPU-friendly, for testing):** ```bash python train_ultrathink.py \ --dataset wikitext \ --hidden_size 256 --num_layers 2 --num_heads 4 \ --batch_size 2 --max_samples 1000 \ --num_epochs 1 ``` **Small Model (GPU recommended):** ```bash python train_advanced.py --config configs/train_small.yaml ``` **With Advanced Features:** ```bash python train_ultrathink.py \ --dataset c4 --streaming \ --hidden_size 768 --num_layers 12 --num_heads 12 \ --enable_moe --enable_dre --enable_constitutional \ --use_amp --gradient_checkpointing \ --use_mlflow ``` ### Docker ```bash # Run Gradio web interface docker compose up # Or build and run manually docker build -t ultrathink:latest . docker run -p 7860:7860 ultrathink:latest ``` ### Testing ```bash # Run all tests pytest # Run with coverage pytest --cov=src --cov-report=html # Quick smoke test python tests/smoke_test.py ``` ## ๐ Documentation ### ๐ Getting Started - **[Training Quickstart](docs/TRAINING_QUICKSTART.md)** - Get started in 5 minutes - **[Advanced Training Guide](ADVANCED_TRAINING_GUIDE.md)** - Deep dive into all features - **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions - **[Google Colab](docs/colab.md)** - Train in the cloud for free ### ๐ Performance & Comparisons - **[Benchmarks](docs/BENCHMARKS.md)** - Performance metrics and results - **[Framework Comparison](docs/COMPARISON.md)** - vs GPT-NeoX, Megatron-LM, Axolotl - **[Model Card](docs/MODEL_CARD.md)** - Model specifications ### ๐๏ธ Architecture & Development - **[Architecture Overview](ARCHITECTURE_OVERVIEW.md)** - Visual system diagrams - **[Project Structure](docs/PROJECT_STRUCTURE.md)** - Understanding the codebase - **[Roadmap](docs/ROADMAP.md)** - Future plans and features ### ๐ Training Guides - [Small Models](docs/training_small.md) - Train on limited hardware - [DeepSpeed Integration](docs/training_deepspeed.md) - Distributed training setup - [Dataset Configuration](docs/datasets.md) - Using custom datasets ### ๐ค Community - **[Contributing](CONTRIBUTING.md)** - Contribution guidelines - **[Code of Conduct](CODE_OF_CONDUCT.md)** - Community standards - **[Changelog](CHANGELOG.md)** - Version history **[๐ Full Documentation Index](docs/README.md)** ## ๐ Project Structure ``` deep/ โโโ train_ultrathink.py # Main training script โโโ train_advanced.py # YAML config-based training โโโ app_gradio.py # Web UI for inference โโโ src/ โ โโโ models/ # UltraThink, MoE, DRE, architecture โ โโโ data/ # Datasets, tokenization, validation โ โโโ training/ # Optimizers, distributed, RLHF โ โโโ monitoring/ # Metrics and system monitoring โ โโโ security/ # Input validation and safety โ โโโ evaluation/ # Benchmarks and metrics โโโ tests/ # Unit and integration tests โโโ configs/ # YAML configuration files โโโ scripts/ # Utilities (profiling, inference) โโโ docs/ # Documentation and guides ``` See **[PROJECT_STRUCTURE.md](PROJECT_STRUCTURE.md)** for detailed explanations. ## ๐ฅ Training Examples ### Small Dataset Training ```bash # WikiText-2 (fast iteration) python train_ultrathink.py \ --dataset wikitext \ --hidden_size 512 --num_layers 6 --num_heads 8 \ --batch_size 4 --num_epochs 3 \ --use_mlflow ``` ### Production Training (C4 Dataset) ```bash # Streaming C4 with all optimizations python train_ultrathink.py \ --dataset c4 --dataset_subset en --streaming \ --hidden_size 768 --num_layers 12 --num_heads 12 \ --batch_size 2 --gradient_accumulation_steps 64 \ --learning_rate 3e-4 --warmup_steps 5000 \ --use_amp --gradient_checkpointing \ --max_seq_length 1024 \ --output_dir ./outputs/c4_production ``` ### Using Configuration Files ```bash # Small model (4-8GB GPU) python train_advanced.py --config configs/train_small.yaml # Medium model (16-32GB GPU) python train_advanced.py --config configs/train_medium.yaml # Large model (40GB+ GPU) python train_advanced.py --config configs/train_large.yaml ``` ## ๐ณ Docker Usage **Web Interface (Gradio):** ```bash docker compose up # Visit http://localhost:7860 ``` **Custom Training:** ```bash docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \ python train_ultrathink.py \ --dataset wikitext \ --hidden_size 256 --num_layers 2 \ --output_dir /app/outputs/my_model ``` **GPU Training:** ```bash docker run --gpus all \ -v $(pwd)/outputs:/app/outputs \ ultrathink:latest \ python train_ultrathink.py --use_amp ``` ## ๐ค Contributing We welcome contributions! Please see: - **[CONTRIBUTING.md](CONTRIBUTING.md)** - Guidelines and setup - **[CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)** - Community standards - **[Roadmap](docs/ROADMAP.md)** - See what we're building next ### ๐ Star History If you find ULTRATHINK useful, please consider giving us a star! โญ [](https://star-history.com/#vediyappanm/UltraThinking-LLM-Training&Date) ## ๐ Model Specifications | Size | Parameters | Layers | Hidden | Context | Min GPU | |------|-----------|--------|--------|---------|---------| | Tiny | 125M | 12 | 768 | 2048 | 6GB | | Small | 350M | 24 | 1024 | 4096 | 16GB | | Medium | 760M | 24 | 1536 | 4096 | 24GB | | Large | 1.3B | 32 | 2048 | 8192 | 40GB | See **[MODEL_CARD.md](MODEL_CARD.md)** for complete specifications. ## ๐ License MIT License - see [LICENSE](LICENSE) for details. ## ๐ Citation If you use ULTRATHINK in your research or project, please cite: ```bibtex @software{ultrathink2025, title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning}, author={ULTRATHINK Team}, year={2025}, url={https://github.com/vediyappanm/UltraThinking-LLM-Training}, version={1.0.0} } ``` ## ๐ Community & Support ### ๐ฌ Get Help - **[GitHub Discussions](https://github.com/vediyappanm/UltraThinking-LLM-Training/discussions)** - Ask questions, share ideas - **[Issue Tracker](https://github.com/vediyappanm/UltraThinking-LLM-Training/issues)** - Report bugs, request features - **[Troubleshooting Guide](docs/TROUBLESHOOTING.md)** - Common issues and solutions - **[FAQ](docs/faq.md)** - Frequently asked questions ### ๐ Share Your Work Built something cool with ULTRATHINK? We'd love to hear about it! - Open a discussion to share your project - Submit a PR to add your model to our showcase - Tweet about it and tag us ### ๐ข Stay Updated - โญ **Star this repo** to get notifications - ๐ **Watch releases** for new features - ๐ฆ **Follow on Twitter** for updates ---Made with โค๏ธ by the ULTRATHINK Team