Update post UGI results

The UGI results were very interesting: the abliterated version is SMARTER than the original, but even more surprisingly, the world model was significantly changed, as measured via UGI's political ideology.

Usually, left-leaning models are smarter than any other ideology (very likely because it aligns better with frontier training data that all open model stems from, in one way or another), but NOT in this case.

Moreover, despite KL divergence being extremely low, the abliterated model demonstrates significant innate changes, as both of the above were altered, intelligence increased, and the writing style was altered too.

All of these results are surprising, especially when combined. This is exactly why alignment should be studied more and tested empirically. Many papers often contradict real-world results; it is for the public good for results to be open and reproducible.

Impish_LLAMA_4B_Abliterated is an abliterated variant of SicariusSicariiStuff/Impish_LLAMA_4B with surgical removal of refusal mechanisms. This model maintains the full capabilities of the original, while eliminating safety guardrails through orthogonalization techniques.

KL divergence

<0.01

Refusals

~3%

What is KL divergence?

Think about it as a way to measure the variance between the original model "World Model," vs the abliterated one; the lower the KL divergence, the closer the "World Model" of the two models to each other.

If the original model thinks making pineapple pizza is a crime against humanity (it is), then the abliterated model will still hold to this belief, but if asked how to make one (probably after giving you a disclaimer about what an abomination that is), it would still tell you how. In other words, most of the knowledge, quirks, and capabilities are preserved.

Technical Specs

Base Model: Impish_LLAMA_4B
Parameters: 4B
Context Length: 128K tokens
Architecture: Llama (decoder-only transformer)
Precision: bf16
Method: Orthogonalization-based abliteration
License: Llama 3.1 Community License

Methodology

Identifies refusal direction vectors in activation space
Orthogonalizes weights to inhibit activation along these directions
Preserves (mostly) all other model behaviors and knowledge

Model Details

Intended use: General Tasks, Roleplay.
Censorship level: Very Low
7.5 / 10 (10 completely uncensored)

UGI score:

Citation Information

@llm{Impish_LLAMA_4B_Abliterated,
  author = {SicariusSicariiStuff},
  title = {Impish_LLAMA_4B_Abliterated},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B_Abliterated}
}