How many graphics cards are needed to train OSS-120B using GRPO?
#158
by
wanghongyu1111 - opened
My prompt length is 4096. How many h100s are needed for training?π
I'm testing a distributed cluster to run this full-weights on consumer cards (pooling 4090s) to bypass the VRAM limit. let me know if you want to run a test job.