Qwen/Qwen3-Next-80B-A3B-Thinking

#1452

by SkyMind - opened Oct 14, 2025

Discussion

SkyMind

Oct 14, 2025

https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking

Thanks!

nicoboss

Oct 14, 2025

As discussed in https://huggingface.co/mradermacher/model_requests/discussions/1449 we are not providing any Qwen3NextForCausalLM based models until https://github.com/ggml-org/llama.cpp/pull/16095 is merged. Especially with the current devastating stability issues Qwen3NextForCausalLM GGUFs are useless for any reasonable use case as after 139 tokens even the smaller models completely break down while larger break down even quicker. There are some serious precision issue inside its llama.cpp implementation that needs to be addressed first.

wqerrewetw

Oct 15, 2025

https://github.com/ggml-org/llama.cpp/pull/16095

nicoboss

Oct 15, 2025

https://github.com/ggml-org/llama.cpp/pull/16095

As far I can see there still was no solution found to the devastating precision issue. I don't get why everyone wants quants that totally break after generating a sentence. There is also the issue of them still introducing compatibility breaking changes. Just yesterday they implemented one that made all previous GGUFs of this model useless. Generally if users have to compile llama.cpp from source I assume they would also be able to use convert_hf_to_gguf.py and llama-quantize to create their own quants.

Apparently the model has 5D tensors but llama-imatrix only supports 2D tensors except for MLA for which it supports 3D tensors so no idea how things will go there for 5D tensors. So maybe even after official support for this model is merged, we might only see static quants or imatrix quants with 5D tensors statically quantified. This might turn out to be an turns out to be a non-issue depending on how the architecture gets implemented in llama.cpp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment