PyTorch Distributed: Experiences on Accelerating Data Parallel Training Paper • 2006.15704 • Published Jun 28, 2020 • 4
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel Paper • 2304.11277 • Published Apr 21, 2023 • 5
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20, 2024 • 175
SageAttention2++: A More Efficient Implementation of SageAttention2 Paper • 2505.21136 • Published May 27 • 45
meta-llama/Llama-3.1-8B-Instruct Text Generation • 8B • Updated Sep 25, 2024 • 10.6M • • 5.18k
Running on CPU Upgrade Featured 992 Model Memory Utility 🚀 992 Calculate vRAM needed for model training and inference