How many graphics cards are needed to train OSS-120B using GRPO?

#158
by wanghongyu1111 - opened

My prompt length is 4096. How many h100s are needed for training?πŸ™

I'm testing a distributed cluster to run this full-weights on consumer cards (pooling 4090s) to bypass the VRAM limit. let me know if you want to run a test job.

Sign up or log in to comment