Qwen/Qwen3-Next-80B-A3B-Thinking
As discussed in https://huggingface.co/mradermacher/model_requests/discussions/1449 we are not providing any Qwen3NextForCausalLM based models until https://github.com/ggml-org/llama.cpp/pull/16095 is merged. Especially with the current devastating stability issues Qwen3NextForCausalLM GGUFs are useless for any reasonable use case as after 139 tokens even the smaller models completely break down while larger break down even quicker. There are some serious precision issue inside its llama.cpp implementation that needs to be addressed first.
As far I can see there still was no solution found to the devastating precision issue. I don't get why everyone wants quants that totally break after generating a sentence. There is also the issue of them still introducing compatibility breaking changes. Just yesterday they implemented one that made all previous GGUFs of this model useless. Generally if users have to compile llama.cpp from source I assume they would also be able to use convert_hf_to_gguf.py and llama-quantize to create their own quants.
Apparently the model has 5D tensors but llama-imatrix only supports 2D tensors except for MLA for which it supports 3D tensors so no idea how things will go there for 5D tensors. So maybe even after official support for this model is merged, we might only see static quants or imatrix quants with 5D tensors statically quantified. This might turn out to be an turns out to be a non-issue depending on how the architecture gets implemented in llama.cpp.