MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 4 days ago • 37
EXAONE 4.5 Collection LG's First Open-Weight Vision-Language Model for Industrial Intelligence • 3 items • Updated about 18 hours ago • 25
DFlash Collection Block Diffusion for Flash Speculative Decoding • 13 items • Updated 4 days ago • 47
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 15 days ago • 51
MolmoWeb Collection This is the collection of MolmoWeb artifacts, including model checkpoints and data. • 5 items • Updated 16 days ago • 22
OpenClaw-RL: Train Any Agent Simply by Talking Paper • 2603.10165 • Published about 1 month ago • 150
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 2 items • Updated Mar 2 • 52
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 30 items • Updated 4 days ago • 84