Some 2bit model without IQ4_KT quants, please!
IQ4_KT quants are disabled on metal backend in ik_llama. Such model would be perfect for us poor 16-24GB Mac owners
Got working IK2_KL, with reconfigured memory allocation on M1 16GB. Need to set at least -ctk q6_0 -ctv q6_0 as well. Overall recipes provided will create my own, replacing IQ5_KT quants with IQ3_KT quants from IK2_KL. Thanks a lot for your work!
Oh interesting, I don't pay much attention to mac metal as I don't have the hardware.
Does it support the KS quants as IQ4_KSS / IQ4_KS / IQ5_KS would be the best if those are supported. Not sure where to find a simple list of what is or is not supported on all the possible backends.
Keep us posted if you get it working and tag your huggingface repo with ik_llama.cpp and feel free to link it here so folks can find it! Cheers!