Instructions to use recursal/EagleX_1-7T_HF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use recursal/EagleX_1-7T_HF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="recursal/EagleX_1-7T_HF", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use recursal/EagleX_1-7T_HF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "recursal/EagleX_1-7T_HF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "recursal/EagleX_1-7T_HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/recursal/EagleX_1-7T_HF

SGLang

How to use recursal/EagleX_1-7T_HF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "recursal/EagleX_1-7T_HF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "recursal/EagleX_1-7T_HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "recursal/EagleX_1-7T_HF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "recursal/EagleX_1-7T_HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use recursal/EagleX_1-7T_HF with Docker Model Runner:
```
docker model run hf.co/recursal/EagleX_1-7T_HF
```

Huggingface EagleX 1.7T Model - via HF Transformers Library

! Important Note !

The following is the HF transformers implementation of the EagleX 7B 1.7T model. This is meant to be used with the huggingface transformers

For the full model weights on its own, to use with other RWKV libraries, refer to here

This is not an instruct tune model! (soon...)

See the following, for the full details on this experimental model: https://substack.recursal.ai/p/eaglex-17t-soaring-past-llama-7b

Running on GPU via HF transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_prompt(instruction, input=""):
    instruction = instruction.strip().replace('\r\n','\n').replace('\n\n','\n')
    input = input.strip().replace('\r\n','\n').replace('\n\n','\n')
    if input:
        return f"""Instruction: {instruction}

Input: {input}

Response:"""
    else:
        return f"""User: hi

Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

User: {instruction}

Assistant:"""


model = AutoModelForCausalLM.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True, torch_dtype=torch.float16).to(0)
tokenizer = AutoTokenizer.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True)

text = "Tell me a fun fact"
prompt = generate_prompt(text)

inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=128, do_sample=True, temperature=1.0, top_p=0.3, top_k=0, )
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

output:

User: hi

Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

User: Tell me a fun fact

Assistant: Did you know that the human brain has 100 billion neurons?

Downloads last month: 551