Instructions to use Ex0bit/MiniMax-M2.5-PRISM-PRO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ex0bit/MiniMax-M2.5-PRISM-PRO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Ex0bit/MiniMax-M2.5-PRISM-PRO", dtype="auto")

llama-cpp-python

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Ex0bit/MiniMax-M2.5-PRISM-PRO",
	filename="MiniMax-M2.5-PRISM-PRO-IQ2_XXS.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

LM Studio
Jan

vLLM

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ex0bit/MiniMax-M2.5-PRISM-PRO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

SGLang

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ex0bit/MiniMax-M2.5-PRISM-PRO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ex0bit/MiniMax-M2.5-PRISM-PRO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ex0bit/MiniMax-M2.5-PRISM-PRO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Ollama:
```
ollama run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
```

Unsloth Studio new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ex0bit/MiniMax-M2.5-PRISM-PRO to start chatting

Pi new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Run Hermes

hermes

Docker Model Runner
How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Docker Model Runner:
```
docker model run hf.co/Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL
```

Lemonade

How to use Ex0bit/MiniMax-M2.5-PRISM-PRO with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ex0bit/MiniMax-M2.5-PRISM-PRO:UD-Q4_K_XL

Run and chat with the model

lemonade run user.MiniMax-M2.5-PRISM-PRO-UD-Q4_K_XL

List all available models

lemonade list

MiniMax-M2.5-PRISM-PRO / README.md

Ex0bit

Update README.md

d7345a4 verified 3 months ago

preview code

raw

history blame

5.83 kB

	---
	license: other
	license_name: prism-research
	license_link: LICENSE.md
	language:
	- en
	- zh
	tags:
	- minimax
	- prism
	- moe
	- reasoning
	- coding
	- agentic
	- abliterated
	pipeline_tag: text-generation
	library_name: transformers
	base_model:
	- MiniMaxAI/MiniMax-M2.5
	base_model_relation: finetune
	---

	[![Parameters](https://img.shields.io/badge/Parameters-MoE-blue)]()
	[![Architecture](https://img.shields.io/badge/Architecture-MoE-green)]()
	[![Context](https://img.shields.io/badge/Context-1M+-orange)]()
	[![License](https://img.shields.io/badge/License-PRISM--Research-purple)]()


	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/shxznHWnvppRhT_yKrsdP.png" width="400"/>
	</p>


	# MiniMax-M2.5-PRISM-PRO

	A Powerful Production ready fully uncessored model intended for COMPLETE over-refusal and propaganda mechanisms suppression using our SOTA PRISM-PRO pipeline.

	PRISM-PRO is available for purchase: https://ko-fi.com/s/0a23d1b9a5

	For Custom trained PRISM versions or raw tensors access reach out @ https://ko-fi.com/ex0bit.

	<div align="center">

	### ☕ Support Our Work

	If you enjoy our work and find it useful, please consider sponsoring or supporting us!

	[![Ko-fi](https://img.shields.io/badge/Ko--fi-Support%20Us-ff5e5b?logo=ko-fi&logoColor=white)](https://ko-fi.com/ex0bit)

	\| Option \| Description \|
	\|--------\|-------------\|
	\| [PRISM PRO VIP Membership](https://ko-fi.com/summary/6bae206c-a751-4868-8dc7-f531afd1fb4c) \| Access to all PRISM models \|
	\| Bitcoin \| `bc1qarq2pyn4psjpcxzp2ghgwaq6y2h4e53q232x8r` \|

	![image](https://cdn-uploads.huggingface.co/production/uploads/63adf1fa42fd3b8dbaeb0c92/Psgbl1TgyDok__C7AMQog.png)

	</div>

	---

	## Model Highlights

	- PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
	- SOTA Coding Performance — 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, 76.3% on BrowseComp (with context management)
	- Frontier Agentic Capabilities — Industry-leading performance in tool use, search, and complex multi-step tasks
	- Efficient Reasoning — Trained with RL to reason efficiently and decompose tasks optimally, 37% faster than M2.1
	- Cost-Effective — $1 for continuous operation at 100 tok/s for an hour; $0.30 at 50 tok/s
	- Modified-MIT Base License — Based on MiniMax's open-weight release

	## Base Model Architecture

	Base MiniMax-M2.5 is a Mixture-of-Experts (MoE) model extensively trained with reinforcement learning across hundreds of thousands of complex real-world environments.

	\| Specification \| Value \|
	\|---------------\|-------\|
	\| Architecture \| Sparse Mixture-of-Experts (MoE) \|
	\| Training \| Extensive RL in 200K+ real-world environments \|
	\| Languages \| 10+ (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby) \|
	\| Inference Speed \| 100 tok/s (Lightning) / 50 tok/s (Standard) \|
	\| Library \| `transformers` \|

	## Benchmarks
	\| Category \| Base (FP8/vLLM) \| PRISM-PRO Q8_0 (llama.cpp) \|
	\|---\|---\|---\|
	\| MMLU 5-shot \| 28/30 (93.3%) \| 28/30 (93.3%) \|
	\| General Knowledge \| 5/5 \| 5/5 \|
	\| Coding \| 4/5 \| 5/5 \|
	\| Reasoning \| 5/5 \| 5/5 \|
	\| Agentic \| 3/5 \| 5/5 \|
	\| Harmful bypass \| 3/10 \| 10/10 (100%) \|
	\| Avg thinking words \| 163w \| 152w \|
	\| Speed \| 72 t/s \| 35-65 t/s \|


	### Coding

	\| Benchmark \| MiniMax-M2.5 \| Claude Opus 4.6 \| Gemini 3 Pro \| GPT-5.2 \|
	\|-----------\|-------------\|-----------------\|-------------\|---------\|
	\| SWE-Bench Verified \| 80.2 \| 78.9 \| 74.0 \| 72.6 \|
	\| Multi-SWE-Bench \| 51.3 \| 50.8 \| — \| — \|
	\| SWE-Bench Multilingual \| 55.6 \| — \| — \| — \|
	\| Terminal-Bench 2.0 \| 51.5 \| 52.1 \| — \| — \|

	### Search & Tool Calling

	\| Benchmark \| MiniMax-M2.5 \| Claude Opus 4.6 \| Gemini 3 Pro \| GPT-5.2 \|
	\|-----------\|-------------\|-----------------\|-------------\|---------\|
	\| BrowseComp \| 76.3 \| 71.2 \| 62.4 \| 57.8 \|

	### Reasoning & Knowledge

	\| Benchmark \| MiniMax-M2.5 \| Claude Opus 4.6 \| Gemini 3 Pro \| GPT-5.2 \|
	\|-----------\|-------------\|-----------------\|-------------\|---------\|
	\| AIME25 \| 86.3 \| 95.6 \| 96.0 \| 98.0 \|
	\| GPQA-D \| 85.2 \| 90.0 \| 91.0 \| 90.0 \|
	\| HLE w/o tools \| 19.4 \| 30.7 \| 37.2 \| 31.4 \|
	\| SciCode \| 44.4 \| 52.0 \| 56.0 \| 52.0 \|
	\| IFBench \| 70.0 \| 53.0 \| 70.0 \| 75.0 \|

	## Usage

	### llama.cpp (GGUF)

	Build the latest master of [llama.cpp](https://github.com/ggml-org/llama.cpp) and run:

	```bash
	~/llama.cpp/build/bin/llama-cli \
	-m ../outputs/MiniMax-M2.5-PRISM-PRO-[QUANT].gguf \
	--jinja \
	-ngl 999 \
	--repeat_penalty 1.15 \
	--temp 1.0 \
	--top_p 0.95 \
	--top_k 40
	```


	> Replace `[QUANT]` with your quantization level (e.g. `Q8_0`, etc.).

	### Recommended Parameters

	\| Use Case \| Temperature \| Top-P \| Top-K \| Repeat Penalty \| Max New Tokens \|
	\|----------\|-------------\|-------\|-------\|----------------\|----------------\|
	\| Reasoning / Coding \| 1.0 \| 0.95 \| 40 \| 1.15 \| 32768 \|
	\| General Chat \| 0.6 \| 0.95 \| 40 \| 1.15 \| 4096 \|
	\| Agentic / Tool Use \| 1.0 \| 0.95 \| 40 \| 1.15 \| 32768 \|



	\| Version \| Description \| Access \|
	\|---------\|-------------\|--------\|
	\| PRISM-LITE \| Abliterated with PRISM-LITE pipeline — removes over-refusal while preserving core capabilities \| Free on Hugging Face \|
	\| PRISM-PRO \| Full PRISM-PRO ablation — Full Production Level Mode suppression of propaganda/refusal mechanisms with maximum capability retention \| [Ko-fi](https://ko-fi.com/s/0a23d1b9a5) \|

	## License

	This model is released under the [PRISM Research License](LICENSE.md).

	The base model [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) is released under a [Modified-MIT License](https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE).

	## Acknowledgments

	Based on [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) by [MiniMax AI](https://www.minimax.io).