Model requests
Booth of them failed to queue due to invalid config.json:
nico1 ~# llmc add -2000 si https://huggingface.co/Vortex5/MS3.2-24B-Omega-Diamond
submit tokens: ["-2000","static","imatrix","https://huggingface.co/Vortex5/MS3.2-24B-Omega-Diamond"]
https://huggingface.co/Vortex5/MS3.2-24B-Omega-Diamond
https://huggingface.co/Vortex5/MS3.2%2d24B%2dOmega%2dDiamond/resolve/main/config.json 429
nico1 ~# llmc add -2000 si https://huggingface.co/Vortex5/Qwen2.5-14B-Styx
submit tokens: ["-2000","static","imatrix","https://huggingface.co/Vortex5/Qwen2.5-14B-Styx"]
https://huggingface.co/Vortex5/Qwen2.5-14B-Styx
https://huggingface.co/Vortex5/Qwen2.5%2d14B%2dStyx/resolve/main/config.json 429
https://huggingface.co/Vortex5/Qwen2.5%2d14B%2dStyx/resolve/main/config.json 429
https://huggingface.co/Vortex5/Qwen2.5%2d14B%2dStyx/resolve/main/config.json 429
Vortex5/Qwen2.5-14B-Styx: no architectures entry (UnicodeDecodeError('utf-8', b'\x1f\x8b\x08\x00 (...), 1, 2, 'invalid start byte') at /llmjob/share/bin/llmjob line 1656.
)
From mradermacher:
you get something like binary garbage instead of a config.json when submitting models, then that is because some hf frontend servers currently deflate-compress files even when not allowed to (and the huggingface python module does not handle that, unsurprisingly).
So this is apparently an issue on HuggingFaces's side which makes sense as the stratus code when downloading the config.json is 429
They are now booth queued! :D
I bumped the priority of MS3.2-24B-Omega-Diamond to -2000 as I think mradermacher initially queued it using priority 0.
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary pages at the following locations for wuants to appear:
as a sidenote, llmc add should now wait out the rate limit (can take 5 minutes though).
ah, and the deflate problem seems to have been fixed on the hf server side
as a sidenote, llmc add should now wait out the rate limit (can take 5 minutes though).
Perfect. Let's hope we won't reach it again but if we do this is a much cleaner solution.
ah, and the deflate problem seems to have been fixed on the hf server side
Great to hear. That was so stupid from them.
Perfect. Let's hope we won't reach it again but if we do this is a much cleaner solution.
It keeps biting us, all the error 139 ones are unexpected rate limit blocks. In essence, an api call now has, like, a one in ~1 in 40 chance of failing or even higher. That is pretty high, especially for non-idempotent requests.