---
license: apache-2.0
base_model: meta-llama/Llama-4-Scout-17B-16E
tags:
- llama4
- checkpoint
- fine-tuned
- step-12136
language:
- en
pipeline_tag: text-generation
---

# tonyzhao123/dummy_llama4

This is a checkpoint from step 12136 of custom Llama4 training.

## Model Details

- **Base Model**: meta-llama/Llama-4-Scout-17B-16E
- **Model Type**: llama4
- **Architecture**: Llama4ForConditionalGeneration
- **Training Step**: 12136
- **Source Checkpoint**: `checkpoint-12136`

## Model Configuration

- **Hidden Size**: 768
- **Number of Layers**: 8
- **Number of Experts (MoE)**: 4
- **Vocabulary Size**: 202048

## Usage

```python
from transformers import AutoTokenizer, AutoModelForImageTextToText
import torch

model_name = "tonyzhao123/dummy_llama4"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Information

This checkpoint was extracted from training step 12136. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.

## Files Included

- `config.json` - Model configuration
- `model.safetensors` - Model weights (single file, no sharding)
- `tokenizer.json` - Fast tokenizer
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `generation_config.json` - Generation parameters (if available)
- `chat_template.jinja` - Chat template (if available)

## Limitations

- This is an intermediate checkpoint and may not represent the final trained model
- Performance may vary depending on the specific training step
- Always evaluate the model on your specific use case

## Citation

```bibtex
@misc{tonyzhao123_dummy_llama4_checkpoint_12136,
  title={tonyzhao123/dummy_llama4 - Checkpoint 12136},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/tonyzhao123/dummy_llama4}
}
```