Instructions to use recursal/EagleX_1-7T_HF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use recursal/EagleX_1-7T_HF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="recursal/EagleX_1-7T_HF", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use recursal/EagleX_1-7T_HF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "recursal/EagleX_1-7T_HF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "recursal/EagleX_1-7T_HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/recursal/EagleX_1-7T_HF
- SGLang
How to use recursal/EagleX_1-7T_HF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "recursal/EagleX_1-7T_HF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "recursal/EagleX_1-7T_HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "recursal/EagleX_1-7T_HF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "recursal/EagleX_1-7T_HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use recursal/EagleX_1-7T_HF with Docker Model Runner:
docker model run hf.co/recursal/EagleX_1-7T_HF
Huggingface EagleX 1.7T Model - via HF Transformers Library
! Important Note !
The following is the HF transformers implementation of the EagleX 7B 1.7T model. This is meant to be used with the huggingface transformers
For the full model weights on its own, to use with other RWKV libraries, refer to here
This is not an instruct tune model! (soon...)
See the following, for the full details on this experimental model: https://substack.recursal.ai/p/eaglex-17t-soaring-past-llama-7b
- Our cloud platform - the best place to host, finetune, and do inference for RWKV
- HF Demo
- Our wiki
- pth model weights
Running on GPU via HF transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_prompt(instruction, input=""):
instruction = instruction.strip().replace('\r\n','\n').replace('\n\n','\n')
input = input.strip().replace('\r\n','\n').replace('\n\n','\n')
if input:
return f"""Instruction: {instruction}
Input: {input}
Response:"""
else:
return f"""User: hi
Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
User: {instruction}
Assistant:"""
model = AutoModelForCausalLM.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True, torch_dtype=torch.float16).to(0)
tokenizer = AutoTokenizer.from_pretrained("recursal/EagleX_1-7T_HF", trust_remote_code=True)
text = "Tell me a fun fact"
prompt = generate_prompt(text)
inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=128, do_sample=True, temperature=1.0, top_p=0.3, top_k=0, )
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))
output:
User: hi
Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
User: Tell me a fun fact
Assistant: Did you know that the human brain has 100 billion neurons?
- Downloads last month
- 551
