MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 3 days ago • 34
EXAONE 4.5 Collection LG's First Open-Weight Vision-Language Model for Industrial Intelligence • 3 items • Updated about 10 hours ago • 24
DFlash Collection Block Diffusion for Flash Speculative Decoding • 13 items • Updated 4 days ago • 46
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 15 days ago • 51
MolmoWeb Collection This is the collection of MolmoWeb artifacts, including model checkpoints and data. • 5 items • Updated 16 days ago • 22